Your AWS invoice tells you what you spent. It never tells you what you wasted. That gap is where Kubernetes cost leaks live — money flowing out of your account every hour for capacity nobody is actually using. The invoice shows you a tidy line item for EC2 or EKS node hours; it stays completely silent on the fact that half of those cores are reserved-but-idle, that a chunk of every node is unschedulable dead space, or that you bought on-demand for a workload that would have run just fine on spot.
Kubernetes is brilliant at abstracting nodes away from developers. That same abstraction is exactly why the waste is invisible. The bin-packer hides which pods landed where, requests hide actual usage, and the autoscaler hides how much headroom you're paying to keep warm. Below are the three highest-impact leaks hiding inside a typical EKS bill — what each one is, why the invoice can't see it, roughly how big it gets, and how to actually spot it.
Leak #1: The request-vs-usage gap (you pay for what you reserve, not what you run)
This is the biggest and most universal Kubernetes cost leak. When you set resources.requests on a pod, the Kubernetes scheduler carves out that CPU and memory on a node and refuses to give it to anyone else — whether your container uses it or not. You are billed for the node; the node is "full" of reservations; the reservations are mostly air.
A typical payment-service deployment might request 2 vCPU and 4Gi of memory per replica because someone copied the value from another team two years ago. Steady-state P99 usage is 300m CPU and 900Mi. The other 1.7 vCPU and 3.1Gi are reserved on the node, unavailable to anything else, and on your bill every single hour.
Why the AWS invoice hides it: AWS bills you for the instance, not for pod requests. From Amazon's side the node is running and healthy, so it charges full rate. The over-reservation lives entirely inside Kubernetes, in the gap between resource requests and limits and what the container actually consumes — a layer CloudWatch and Cost Explorer simply do not see.
Rough magnitude: Industry FinOps surveys routinely find clusters running at 10–25% CPU utilization against requests. If your nodes are 80% "requested" but 20% used, you are paying for roughly 3–4x the compute you need on the affected workloads.
How to spot it: You need per-container request-vs-usage data over time, not a point-in-time snapshot. Pull each workload's CPU/memory requests alongside its P95/P99 actual usage across a few weeks (a single bad day shouldn't drive a downsize). The honest way to size a workload is recommended_request = P99_usage × safety_buffer, with sane floors so you never starve a pod to 1m of CPU. Anything where the request towers over the P99 line is a leak.
See your own gap in minutes. Atmosly's Kubernetes cost intelligence attributes cost per workload using a
Max(request, usage)model and surfaces request-vs-usage rightsizing — recommended requests derived from historical P95/P99 plus a safety buffer, with floors to avoid under-provisioning. Connect a cluster for free and the wasted spend shows up on day one. No spreadsheet archaeology required.
Leak #2: Idle and unallocated cluster capacity (the dead space between nodes)
Even if every pod were perfectly sized, you would still be leaking money — because there is almost always a gap between what your nodes can hold and what your pods requested. That gap is unallocated capacity: cores and gigabytes you provisioned, are paying for, and never scheduled anything onto.
It comes from the realities of bin-packing. The autoscaler adds a whole node to fit one pod that didn't quite fit elsewhere, leaving most of the new node empty. DaemonSets and system reservations eat allocatable headroom. Anti-affinity rules and topology spreads force pods apart, fragmenting capacity across more nodes than the raw resource math requires. Each of these is reasonable in isolation; together they leave you paying for a cluster that is structurally larger than its workload.
Why the AWS invoice hides it: This one is especially sneaky because the nodes are doing their job — they're up, they passed health checks, they're billed at full rate. AWS has no concept of "allocated." The unallocated portion is purely a Kubernetes scheduling artifact, computable only if you know both node capacity and the sum of pod requests on each node at the same time.
Rough magnitude: Idle/unallocated capacity commonly runs 20–40% of cluster compute cost, and worse on clusters with aggressive anti-affinity, large headroom buffers, or many small node groups.
How to spot it: The math is simple once you have the data: idle_cost = total_node_capacity_cost − total_allocated_cost. Compute the billable cost of every node, subtract the cost of everything actually requested on it, and what's left is the leak. (If your cluster is overcommitted — requests exceed capacity — idle is zero, which is its own conversation about reliability.) The key is attributing real per-node cost, not a flat blended rate, because a Graviton spot node and an on-demand x86 node are worlds apart in price.
This is also where Platform Engineering earns its keep: surfacing idle cost per cluster and namespace on an internal developer platform turns an invisible infra problem into a number a team owns. For a deeper EKS-specific walkthrough, see our guide on optimizing Amazon EKS costs.
Leak #3: The wrong node and purchasing mix (paying list price for commodity compute)
The third leak isn't about how much compute you run — it's about what kind you bought it as. Two clusters running identical workloads can have radically different bills purely based on instance family, architecture, and purchasing model.
The usual culprits:
- On-demand where spot would do. Stateless, replicated, interruption-tolerant workloads (most web tiers, batch jobs, CI runners) are textbook Spot candidates — typically 60–90% cheaper — yet run on full-price on-demand.
- x86 where Graviton fits. AWS Graviton (ARM64) instances deliver meaningfully better price-performance for most cloud-native workloads, and modern container images are multi-arch. Staying on x86 by inertia is a standing leak.
- Oversized or wrong-family nodes. A memory-light, CPU-heavy workload pinned to a balanced
m-family node group wastes the memory you pay for. The wrong instance shape leaves a permanent gap. - Uncovered steady-state baseline. The portion of your fleet that never scales to zero is exactly what Savings Plans and Reserved Instances exist for — and is frequently left at on-demand rates.
Why the AWS invoice hides it: The bill shows what you did buy at the price you did pay. It never shows the counterfactual — what the same workload would have cost on a cheaper, equally-valid instance type or purchasing model. Tools like Karpenter provision nodes intelligently, but they provision against the requests you gave them; if those requests are wrong (Leak #1), the node choice inherits the error.
Rough magnitude: A wrong-mix correction — spot adoption plus Graviton plus right-family sizing — routinely moves 30–50% of compute cost, stacking on top of the rightsizing savings from Leak #1.
How to spot it: For each node, ask: given the workloads actually scheduled here, what is the cheapest instance candidate in this region that still fits — on-demand, spot-estimated, and a balanced option? The gap between your current per-node cost and that cheaper-but-valid fit is the leak. Done per node across the fleet, it adds up fast.
Plugging the leaks: visibility first, then action
Notice the pattern: every one of these leaks is invisible on the AWS invoice because each lives in the Kubernetes layer that AWS can't see — pod requests, scheduler allocation, and the counterfactual of a cheaper node. You cannot optimize what you cannot measure, and standard cloud billing dashboards measure the wrong layer.
The sequence that works:
- Get per-workload and per-node cost visibility built from in-cluster metrics, not blended estimates — CPU/memory requests, P95/P99 usage, node capacity, and the real price of each node.
- Close the request-vs-usage gap with rightsizing recommendations, applied safely (GitOps PR for durable change, with floors so nothing gets starved).
- Quantify idle as capacity-minus-allocated, then drive it down by consolidating and tightening autoscaler headroom.
- Fix the mix with per-node cheaper-instance recommendations across spot, Graviton, and right-sized families.
This is precisely the layer Atmosly sits in: in-cluster telemetry feeding granular cost attribution by cluster, namespace, and workload, plus rightsizing and node recommendations. It's the intelligence layer on top of your autoscaler — it tells Karpenter what to run by sizing the pods and nodes first. If you want the strategy view, our breakdown of how to cut Kubernetes costs in 2026 ties these threads together. Or just connect a cluster and watch the three leaks light up.
