reachlin

reachlin's development notes

Spent today getting the EMQX dashboard accessible via an SSM tunnel without breaking anything. Turned into a good lesson in how AWS NLB proxy protocol interacts with plain HTTP services.

The original problem

The EMQX dashboard service was configured as ClusterIP — only reachable inside the Kubernetes cluster. The existing tunnel script pointed at the MQTT broker’s hostname on port 8080, which was wrong on two counts: wrong host (that’s the MQTT NLB, not the dashboard) and wrong port (dashboard runs on 18083). End result: connection refused or a 400.

Why not just add the dashboard port to the existing internal NLB?

The existing internal NLB for MQTT traffic has aws-load-balancer-proxy-protocol: "*" set. This annotation applies to every port on that service — including any new ones you add. Proxy Protocol prepends a TCP header with the original client IP before the payload. MQTT listeners understand this. The EMQX dashboard’s HTTP server does not. So tunneling through that NLB on a new port gives you an immediate 400 Bad Request because the HTTP server sees garbage before the GET request.

The fix: a separate Kubernetes service for the dashboard with no proxy protocol annotation.

apiVersion: v1
kind: Service
metadata:
  name: emqx-dashboard-internal
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internal"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
    # no proxy-protocol here
spec:
  type: LoadBalancer
  ports:
    - port: 8080
      targetPort: 18083

Port 8080 on the NLB maps to 18083 on the pod. The 8080 is needed because the SSM session policy restricts which destination ports are allowed — 18083 isn’t on the list, 8080 is.

The SSM policy double-constraint

The SSM StartPortForwardingSessionToRemoteHost document has two allowlists:

The raw ELB hostname (*.elb.us-east-2.amazonaws.com) fails the hostname check. So you need a proper DNS record under an allowed domain. A Route53 internal alias record pointing at the NLB does the job — it resolves correctly from inside the VPC where the jumpbox lives.

Terraform after kubectl

The apply order matters here. The NLB is provisioned by Kubernetes when you apply the service manifest. Terraform then looks it up by tag:

data "aws_lb" "emqx_dashboard_internal_nlb" {
  tags = {
    "elbv2.k8s.aws/cluster" = "eks-${terraform.workspace}"
    "service.k8s.aws/stack" = "emqx/emqx-dashboard-internal"
  }
}

If you run terraform apply before kubectl apply, the data source lookup fails. One-way dependency: K8s creates the NLB, Terraform creates the DNS record. No loop.

What ended up shipping

The dashboard is now reachable with a single command through the existing SSM jumpbox infrastructure, with no changes to the SSM policy or security groups.