EKS Networking Deep Dive: Why Your Pods Can't Talk

Rantideb Howlader8 min read

Introduction: The Black Box

I hired a Senior Engineer once. I asked him: "What happens when you ping google.com from a Pod?" He said: "It... goes to the internet?" I asked: "How? What interface does it use? What NAT? What is a veth pair?" He went silent.

Kubernetes networking is mostly magic. In standard K8s, it's an Overlay (Flannel/Calico). It's a fake network on top of the real network. In AWS EKS, it's different. AWS uses the VPC CNI Plugin. This means Pods get real IP Addresses from your VPC.

This is a superpower. It means your Pod (10.0.1.50) is a first-class citizen on the network. It also means you can run out of IP addresses in about 10 minutes if you aren't careful.

I once worked on a cluster where we ran out of IPs. The Auto Scaling Group tried to launch 50 nodes. The nodes launched, but the Pods stayed in Pending. Why? Because us-east-1 had no more IPs to give. We had to rebuild the entire VPC CIDR range. It took 3 months.

In this guide, we are going to open the Black Box. We will look at ENIs, Secondary IPs, Custom Networking, and how to verify that your cluster won't explode next week.


Part 1: The VPC CNI Plugin (The AWS Way)

Standard K8s (Overlay):

  • Node IP: 10.0.0.5
  • Pod IP: 192.168.0.5 (Fake IP, only exists inside the cluster).
  • Traffic is encapsulated (VXLAN/IPIP).

EKS (Underlay):

  • Node IP: 10.0.0.5
  • Pod IP: 10.0.0.6 (Real VPC IP).
  • No encapsulation. High performance.

The Mechanism:

  1. Kubelet sees a new Pod.
  2. It calls the CNI Plugin.
  3. The Plugin calls AWS EC2 API: "Give me an IP."
  4. It attaches that IP to the Node's ENI (Elastic Network Interface).
  5. It assigns it to the Pod.

Part 2: IP Exhaustion (The Trap)

Here is the math that kills you. An m5.large instance supports:

  • 3 ENIs.
  • 10 IPs per ENI.
  • Total: 30 IPs.

This means you can only run 29 Pods on that node (1 IP for the node itself). Even if you have plenty of CPU/RAM, if you try to launch Pod #30, it fails. FailedCreatePodSandBox: traffic control: no IP addresses available.

The Subnet Problem: If your subnet is /24 (254 IPs). And you have 10 nodes (30 IPs each = 300 IPs). You just ran out of IPs in the subnet. You cannot launch any more nodes. Game Over.


Part 3: The Fix: Prefix Delegation

AWS recently released Prefix Delegation. Instead of assigning 1 IP at a time, we assign a /28 Block (16 IPs) to the ENI.

  • m5.large with Prefix Delegation:
  • 3 ENIs * 10 Slots * 16 IPs per Slot = 480 IPs.

Now you will essentially never run out of IPs on the Node. (But you still might run out of IPs in the Subnet! Make your subnets big. Use /16 or /18 for EKS.)


Part 4: Custom Networking (Saving IPs)

"But I designed my VPC in 2018 and I only have a small /24 subnet left!" Don't worry. Use Custom Networking.

  1. Create a secondary CIDR for your VPC (e.g., 100.64.0.0/16 from the CG-NAT range).
  2. Create new subnets in that range.
  3. Tell EKS CNI: "Put the Nodes in the Primary Subnet (10.0.x.x), but put the Pods in the Secondary Subnet (100.64.x.x)."

Now your Pods consume "fake" private IPs that don't conflict with your corporate RFC1918 space, but they are still routable within the VPC.


Part 5: Security Groups for Pods

Historically, all Pods on a Node shared the Node's Security Group. This was bad. "Why can the Frontend Pod talk to the Database?" "Because they are on the same Node."

The New Way: Security Groups for Pods. You can attach a Security Group directly to a Pod spec.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    vpc.amazonaws.com/pod-eni-config: "my-config"
    vpc.amazonaws.com/security-group-ids: "sg-123456"

The CNI creates a special "Trunk ENI" and gives the Pod a dedicated interface protected by that SG. Now you can have true micro-segmentation.


Part 6: CoreDNS (Service Discovery)

Networking is useless if you can't find the destination. CoreDNS is the phonebook. curl payment-service.default.svc.cluster.local

The Failure Mode: CoreDNS is just a Deployment. It runs as Pods. If you have 1000 nodes and 2 CoreDNS pods... Every time a Pod does a lookup, it hits CoreDNS. CoreDNS gets overwhelmed. Latency increases. Apps timeout.

The Fix:

  1. NodeLocal DNSCache: A DaemonSet that runs a DNS cache on every node. Pods talk to their local cache first.
  2. Autoscale CoreDNS: Make sure you have cluster-proportional-autoscaler running.

Part 7: The Overlay Alternative (Calico)

Sometimes, you don't want the VPC CNI.

  • Maybe you need millions of IPs.
  • Maybe you need fancy Network Policies (Deny-by-default).

You can remove the AWS CNI and install Calico (Tigera). Calico uses VXLAN (Encapsulation).

  • Pros: Infinite IP space. Advanced policy engine.
  • Cons: ~5% CPU overhead for encapsulation. Harder to debug (packet captures are messy).

Part 8: Debugging (The Toolkit)

When networking breaks, kubectl get pods tells you nothing. You need the toolbox.

  1. kubectl exec -it pod -- ping 8.8.8.8: Internet Check.
  2. kubectl exec -it pod -- nslookup google.com: DNS Check.
  3. tcpdump: Packet capture.
    • Hard on distroless images. use kubectl debug to attach a sidecar with tcpdump.
  4. VPC Flow Logs: Check if the traffic was rejected by a Security Group (REJECT).

Part 9: Expert Glossary

  • CNI (Container Network Interface): The standard API for K8s networking plugins.
  • ENI (Elastic Network Interface): A virtual network card in AWS.
  • SNAT (Source Network Address Translation): When a Pod talks to the internet, its private IP is replaced by the Node's public IP (or NAT Gateway IP).
  • MTU (Maximum Transmission Unit): The packet size. AWS uses Jumbo Frames (9001) inside VPC, but 1500 to the internet. Mismatches cause hanging connections (Black Hole Router).
  • Service Mesh (Istio/Linkerd): A layer above CNI that handles mTLS and Traffic split.

Part 10: IPv6 (The Nuclear Option)

If you are building a new cluster today, consider IPv6. Why?

  • IPv4: You have 65,000 IPs (in a /16). You will run out.
  • IPv6: You have 340 Undecillion IPs. You will never run out.

EKS supports IPv6.

  • Node: Dual Stack (Gets an IPv4 for management + IPv6).
  • Pod: IPv6 Only.
  • The Magic: The VPC CNI handles the translation.
    • Pod talking to Pod: IPv6.
    • Pod talking to Internet (IPv4): Goes through an Egress-Only Internet Gateway (NAT64).
    • Pod talking to Cluster Service: IPv6.

Why everyone doesn't use it: Legacy tools. Some old monitoring agents break on IPv6. But for 2026, it is the recommendation.


Part 11: Service Mesh (Istio / Linkerd)

The CNI handles Layer 3 (IP Packets). The Service Mesh handles Layer 7 (HTTP Requests).

How it works (The Sidecar):

  1. You deploy your App Pod.
  2. Istio injects a second container (istio-proxy / Envoy) into the Pod.
  3. Istio uses iptables to hijack all traffic.
    • Traffic -> Pod -> IPTables -> Envoy -> App.

Why use it?

  • mTLS: Automatic encryption between pods. (Zero Trust).
  • Traffic Splitting: "Send 1% of traffic to version 2". (Canary).
  • Circuit Breaking: "If Service B fails 5 times, stop calling it."

The Cost: Latency. Every hop adds ~2ms. Complexity. Debugging iptables rules is hard.


Part 12: Debugging EKS Networking (The Playbook)

When a Pod can't connect, follow this checklist.

  1. Check IP: kubectl get pod -o wide. Does it have an IP? Is it in the right subnet?
  2. Check SG: Find the ENI of the Pod. Check its Security Group rules. (Inbound/Outbound).
  3. Check NACL: Network ACLs are stateless. Did you allow Ephemeral Ports (1024-65535) on the return traffic?
  4. Check CoreDNS: kubectl logs -n kube-system -l k8s-app=kube-dns. Are there errors?
  5. Tcpdump:
    • If you can't install tcpdump (Distroless), use Ephemeral Containers.
    • kubectl debug -it pod/mypod --image=nicolaka/netshoot
    • Inside debug pod: tcpdump -i eth0 port 80
    • Run curl from another pod. Do you see the SYN packet? Do you see the ACK?

Part 13: Expert Glossary

  • VPC CNI: The plugin that assigns real VPC IPs to Pods.
  • Overlay: A network on top of a network (VXLAN). Used by Calico/Flannel.
  • Underlay: The physical network (VPC). Used by AWS CNI.
  • ENI Trunking: A feature that allows one instance to have more ENIs than the hardware limit (used for Security Groups for Pods).
  • Prefix Delegation: Assigning a /28 block to an ENI instead of 1 IP.
  • SNAT (Source NAT): Changing the Source IP of a packet so it can route back.
  • CoreDNS: The DNS server inside K8s.

Conclusion: It's Just Plumbing

EKS Networking is dense. But it follows the laws of physics (and TCP/IP). Most issues come down to:

  1. Running out of IPs.
  2. Security Groups blocking ports.
  3. DNS overload.

If you respect the IP limits, resize your subnets, and monitor CoreDNS, it works beautifully. If you ignore them, you will have a very bad Monday.

Further Reading


Ranti

Rantideb Howlader

Author

Connect