By Atmosly Team June 12, 2026 Cost Optimization

Right-Sizing Kubernetes Workloads: Data-Driven Approach

Comprehensive guide to right-sizing Kubernetes workloads using Prometheus telemetry: percentile analysis, safety buffers, validation methodology, and how Atmosly automates 30-day utilization analysis for instant recommendations.

Introduction: The Right-Sizing Challenge

Right-sizing Kubernetes workloads—configuring CPU and memory resource requests and limits that align with actual application consumption—represents the fastest path to structural cost savings in container environments. Yet most organizations operate with massive over-provisioning: average request-to-usage ratios of 3-5x for CPU and 2-4x for memory. This means for every CPU core or gigabyte of memory actually consumed by applications, teams have requested and pay for three to five times that amount, resulting in billions of wasted CPU milliseconds and terabytes of unused memory capacity.

The over-provisioning epidemic stems from multiple causes: developers request resources conservatively "to be safe" without measuring actual needs, copy-paste inheritance from example manifests designed for different workload profiles, lack of production telemetry during initial deployment forcing engineers to guess, organizational risk aversion preferring waste over potential performance issues, and absence of feedback loops showing teams the financial impact of their resource requests. When compounded across hundreds of microservices and thousands of pods, cumulative waste reaches staggering proportions.

However, aggressive under-sizing poses equally serious risks: insufficient CPU causes throttling degrading latency and throughput, inadequate memory triggers OOMKilled terminations interrupting service, under-provisioned requests lead to poor scheduler decisions placing pods on nodes lacking actual capacity, and overly tight limits prevent handling legitimate traffic spikes. The key challenge is finding balance: resource configurations providing necessary safety buffers while eliminating structural waste.

This comprehensive guide provides a systematic, data-driven methodology for safely rightsizing Kubernetes workloads at scale using Prometheus for telemetry collection and platforms like Atmosly that automate the analysis. You'll learn how to collect 30 days of accurate utilization data, analyze usage at appropriate percentiles (P95/P99) rather than misleading averages, calculate recommended requests and limits with workload-specific safety buffers, validate changes through testing before production rollout, implement gradually while monitoring impact, and establish continuous re-analysis as applications evolve.

Understanding Kubernetes Resource Management

Requests vs Limits: The Two-Tier Model

Kubernetes uses a dual-tier resource management system that many engineers misunderstand:

Resource Requests (Scheduling Guarantee):

Define the minimum resources that Kubernetes guarantees the container
Scheduler uses requests for pod placement decisions—only schedules pods on nodes with sufficient unreserved capacity
Requests do NOT limit consumption—containers can use more if node capacity available
Determine pod Quality of Service (QoS) class affecting eviction priority during resource pressure
Cloud billing based on node capacity, but over-requesting drives unnecessary node scale-out increasing infrastructure spend

Resource Limits (Hard Ceiling):

Define the maximum resources a container may consume
CPU limits enforced through throttling—kernel pauses execution if container exceeds limit
Memory limits enforced through OOMKill—kubelet terminates container immediately if it exceeds memory limit
Limits do NOT affect scheduling—scheduler ignores limits when placing pods
Setting limits too low causes artificial performance degradation even when node has spare capacity

Example Configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-api
spec:
  replicas: 10
  template:
    spec:
      containers:
      - name: api
        image: payment-api:v2.4
        resources:
          requests:
            cpu: 500m        # Scheduler guarantees 0.5 CPU cores
            memory: 1Gi      # Scheduler guarantees 1 GB RAM
          limits:
            cpu: 1000m       # Maximum 1 CPU core (throttled if exceeded)
            memory: 2Gi      # Maximum 2 GB RAM (OOMKilled if exceeded)

Quality of Service (QoS) Classes

Resource configuration determines pod QoS class affecting eviction priority during node pressure:

QoS Class	Configuration	Eviction Priority	Use Case
Guaranteed	Requests = Limits for all resources	Lowest (last to evict)	Critical production services requiring predictable performance
Burstable	Requests < Limits (or only requests set)	Medium	Most production workloads—can burst during peaks but don't need guarantees
BestEffort	No requests or limits defined	Highest (first to evict)	Non-critical batch jobs tolerating interruption

Common Misconfiguration Patterns

Anti-Pattern 1: Massive Over-Requesting

# BAD: Developer "playing it safe" without data
resources:
  requests:
    cpu: 4000m      # 4 full CPU cores requested
    memory: 8Gi     # 8 GB RAM requested
  limits:
    cpu: 8000m
    memory: 16Gi

# Reality: Actual P99 usage is 450m CPU, 950Mi memory
# Result: 8.9x CPU over-provisioned, 8.6x memory over-provisioned
# Cost: $890/month per pod vs $98/month if properly sized
# Waste: $792/month per pod × 10 replicas = $7,920/month wasted

Anti-Pattern 2: No Resource Requests (BestEffort)

# BAD: No resource management
resources: {}  # No requests or limits

# Impact: Pod gets BestEffort QoS (evicted first during pressure)
# Risk: Unpredictable performance, scheduler places on any node regardless of capacity
# Problem: No cost attribution (can't track which team consumed resources)

Anti-Pattern 3: Limits Without Requests

# SUBOPTIMAL: Only limits defined
resources:
  limits:
    cpu: 2000m
    memory: 4Gi
  # No requests specified

# Behavior: Kubernetes automatically sets requests = limits
# Result: Forces Guaranteed QoS when Burstable would be more appropriate
# Impact: Wastes resources that could be shared across workloads

Phase 1: Collect 30 Days of Prometheus Telemetry

Prerequisites: Prometheus and Metrics Exporters

Right-sizing requires historical utilization data from Prometheus with these components:

kube-state-metrics: Exposes resource requests, limits, and pod metadata
node-exporter: Provides node-level CPU, memory, disk metrics
cAdvisor: Embedded in kubelet, exposes container-level usage metrics

If Prometheus not deployed, install with 30-day retention minimum:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus \
  --namespace monitoring \
  --create-namespace \
  --set server.retention=30d \
  --set server.persistentVolume.size=100Gi

Essential Metrics for Rightsizing

CPU Metrics (per container):

# Actual CPU usage (cores per second)
container_cpu_usage_seconds_total

# CPU requests (what was requested)
kube_pod_container_resource_requests{resource="cpu"}

# CPU limits
kube_pod_container_resource_limits{resource="cpu"}

# CPU throttling (indicates limits too low)
container_cpu_cfs_throttled_seconds_total
container_cpu_cfs_periods_total

Memory Metrics (per container):

# Actual memory usage (working set, excludes cache)
container_memory_working_set_bytes

# Memory requests
kube_pod_container_resource_requests{resource="memory"}

# Memory limits
kube_pod_container_resource_limits{resource="memory"}

# OOMKill events (indicates limits too low)
kube_pod_container_status_terminated_reason{reason="OOMKilled"}

Data Collection Best Practices

1. Minimum 14-Day Lookback (30 Days Recommended)

Collect at least two full business weeks to capture:

Weekday vs weekend traffic patterns
Peak load periods (morning login rush, nightly batch processing)
Weekly cycles (Monday traffic spikes, Friday drop-offs)
Monthly patterns (end-of-month reporting, payroll processing)

2. Scrape Interval: 15-60 Seconds

# Prometheus scrape configuration
scrape_configs:
  - job_name: 'kubernetes-pods'
    scrape_interval: 30s      # Balance granularity vs storage
    scrape_timeout: 10s
    kubernetes_sd_configs:
      - role: pod

3. Segment by Workload Criticality

# Label pods by criticality tier for differentiated safety buffers
kubectl label deployment payment-api tier=tier1-critical
kubectl label deployment analytics-batch tier=tier3-batch
kubectl label deployment internal-dashboard tier=tier2-standard

Phase 2: Analyze at Appropriate Percentiles

Why Percentiles Matter More Than Averages

Using average utilization for rightsizing is a critical mistake causing under-provisioning and performance degradation:

Metric	Value	If Used for Sizing
Average CPU	320m	Under-provisioned 50% of time (throttling)
P50 (median)	380m	Under-provisioned 50% of time
P95	680m	Under-provisioned 5% of time (acceptable)
P99	1150m	Under-provisioned 1% of time (conservative)
Max	2400m	Over-provisioned (may be anomaly/restart spike)

If you set CPU requests to 320m (average), application will be under-provisioned during 50% of operating time causing throttling and latency degradation. If you set to 2400m (max), you're over-provisioning for potentially anomalous spikes.

Recommended Percentile Targets by Workload Type

Workload Type	Target Percentile	Safety Buffer	Rationale
Tier-1 Critical (payments, auth)	P99	25-30%	Minimize any performance risk
User-Facing Services (APIs, web)	P95	20-25%	Balance cost and acceptable occasional throttling
Internal Services (dashboards)	P90	15-20%	Cost-optimized, tolerate occasional degradation
Batch/Background Jobs	P85	10-15%	Aggressive optimization, throughput over latency

Prometheus Queries for Percentile Analysis

# Calculate P95 CPU usage over 30 days for payment-api
quantile_over_time(0.95,
  rate(container_cpu_usage_seconds_total{
    pod=~"payment-api-.*",
    container="api"
  }[5m])[30d:5m]
)

# Calculate P99 memory usage over 30 days
quantile_over_time(0.99,
  container_memory_working_set_bytes{
    pod=~"payment-api-.*",
    container="api"
  }[30d:1m]
)

# Identify top 20 over-provisioned workloads (request/usage ratio)
topk(20,
  kube_pod_container_resource_requests{resource="cpu"} /
  on(pod,container) rate(container_cpu_usage_seconds_total[7d])
)

Phase 3: Automated Analysis with Atmosly

Manual Analysis Pain Points

Rightsizing 100+ workloads manually requires:

4-8 hours writing PromQL queries per workload
Exporting data to spreadsheets for percentile calculations
Manually determining appropriate safety buffers per workload type
Calculating cost impact using AWS/GCP pricing data
Generating Kubernetes YAML patches for each service
Total effort: 40-60 hours for 100 workloads (monthly re-analysis needed)

How Atmosly Automates Rightsizing

Instead of manual Prometheus analysis, Atmosly connects to existing Prometheus and automates the entire workflow:

Step 1: Cluster Import (15 minutes)

# Import cluster via Atmosly UI
1. Navigate to: Clusters → Import Cluster
2. Select: AWS EKS / GCP GKE / Azure AKS / Other Kubernetes
3. Provide: Cluster name, region, credentials
4. Atmosly validates: Kubernetes API access, Prometheus connectivity
5. Auto-discovery: Finds Prometheus service endpoint automatically

Step 2: Automatic 30-Day Data Ingestion (2 hours)

Atmosly connects to Prometheus service (prometheus-server.monitoring)
Pulls 30 days of container CPU and memory metrics
Collects resource requests, limits, and pod metadata from kube-state-metrics
Identifies workload types (Deployment, StatefulSet, Job, CronJob, DaemonSet)
Classifies by criticality based on namespace labels and naming patterns

Step 3: ML-Powered Recommendation Generation

Atmosly's algorithm automatically:

Calculates P50, P90, P95, P99, and max usage for each container
Applies workload-specific percentile targets (P95 for standard, P99 for critical)
Adds appropriate safety buffers (20-30% headroom above target percentile)
Enforces minimum thresholds (100m CPU, 128Mi memory) preventing under-provisioning
Calculates burst capacity limits (typically 2x requests)
Computes current monthly cost vs recommended monthly cost
Ranks recommendations by dollar savings potential

Step 4: Dashboard with Actionable Recommendations

Workload	Container	Current Cost	Optimized Cost	Savings	Over-Prov %
payment-api	api	$24.50/mo	$6.85/mo	$17.65	72%
analytics-job	processor	$18.20/mo	$4.20/mo	$14.00	77%
prometheus-grafana	grafana	$10.96/mo	$0.66/mo	$10.30	94%

Total Time: 2 hours (vs 40-60 hours manual)

Phase 4: Calculate Recommendations with Safety Buffers

Recommendation Formula

For CPU Requests:

Recommended_CPU_Request = P95_CPU_Usage × (1 + Safety_Buffer)

Example (standard tier-2 service):
  P95 CPU usage over 30 days: 680m
  Safety buffer: 20%
  Recommended CPU request: 680m × 1.20 = 816m (round to 850m)

For Memory Requests:

Recommended_Memory_Request = P95_Memory_Usage × (1 + Safety_Buffer)

Example:
  P95 Memory usage: 950Mi
  Safety buffer: 20%
  Recommended memory request: 950Mi × 1.20 = 1140Mi (round to 1200Mi)

For Limits (Burst Capacity):

Recommended_CPU_Limit = Recommended_CPU_Request × Burst_Multiplier

Typical burst multipliers:
  - Stateless APIs: 2.0x (allow 2x burst)
  - Databases: 1.5x (less burstable, consistent usage)
  - Batch jobs: 3.0x (highly variable, benefit from burst)

Example:
  Recommended request: 850m
  Burst multiplier: 2.0x
  Recommended limit: 1700m

Guardrails Preventing Under-Provisioning

# Minimum thresholds (prevent aggressive under-sizing)
MINIMUM_CPU_REQUEST = 100m      # Prevent excessive throttling
MINIMUM_MEMORY_REQUEST = 128Mi  # Prevent OOMKills

# Maximum single reduction (staged rollout)
MAX_REDUCTION_PERCENTAGE = 50%  # Don't cut more than 50% at once

# Exclusion criteria
EXCLUDE_IF:
  - Label: cost-optimization=disabled
  - Label: rightsizing=manual
  - Insufficient data: < 7 days metrics
  - High variance: stddev > 2x mean

Real Example: payment-api Container

# BEFORE (Over-Provisioned)
resources:
  requests:
    cpu: 2000m        # 2 full CPU cores
    memory: 4Gi       # 4 GB RAM
  limits:
    cpu: 4000m
    memory: 8Gi

# ATMOSLY 30-DAY ANALYSIS
P50 CPU: 320m | P95 CPU: 680m | P99 CPU: 1150m | Max: 1680m
P50 Mem: 720Mi | P95 Mem: 950Mi | P99 Mem: 1280Mi | Max: 1520Mi

Workload tier: tier-2 (standard)
Target percentile: P95
Safety buffer: 20%

# RECOMMENDED (Data-Driven)
resources:
  requests:
    cpu: 850m         # 680m P95 + 25% = 850m (58% reduction)
    memory: 1200Mi    # 950Mi P95 + 26% = 1200Mi (71% reduction)
  limits:
    cpu: 1700m        # 2x requests for burst
    memory: 2400Mi    # 2x requests for burst

# COST IMPACT
Current cost: $24.50/month per pod
Recommended cost: $6.85/month per pod
Monthly savings: $17.65 per pod
Cluster savings (10 replicas): $176.50/month
Annual savings: $2,118/year for this one service

Phase 5: Validate Before Production

Staging Environment Testing

Apply recommendations to staging first
Run load tests mimicking production traffic
Monitor SLIs closely: P95/P99 latency, error rate, CPU throttling, OOMKills
Validate for 48-72 hours before production

# Load test with k6
k6 run --vus 500 --duration 30m load-test.js

# Monitor during load test
kubectl top pods -n staging | grep payment-api

# Check for throttling (should be < 5%)
kubectl exec -n staging prometheus-0 -- promtool query instant \
  'sum(rate(container_cpu_cfs_throttled_seconds_total{pod=~"payment-api-.*"}[5m])) /
   sum(rate(container_cpu_cfs_periods_total{pod=~"payment-api-.*"}[5m]))'

Progressive Production Rollout

Wave 1: 10% of Replicas (Week 1)

# Update 1 of 10 pods
kubectl patch deployment payment-api --type=json -p='[
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "850m"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "1200Mi"}
]'

# Monitor for 48 hours
# Compare metrics between old (2000m/4Gi) and new (850m/1200Mi) pods

Wave 2: 50% of Replicas (Week 2, if Wave 1 successful)

Wave 3: 100% Rollout (Week 3, if Wave 2 successful)

Phase 6: Monitor and Iterate

Post-Deployment Monitoring

Critical SLIs to Track:

Metric	Target	Alert Threshold	Action if Exceeded
P95 Latency	No increase	>10% increase	Rollback immediately
Error Rate	< 0.1%	>0.15%	Investigate, consider rollback
CPU Throttling	< 5%	>10%	Increase CPU request 20%
OOMKill Events	Zero	Any OOMKill	Increase memory request 30%
Pod Restart Rate	< 1/day	>3/day	Rollback and investigate

Continuous Re-Analysis Cadence

Weekly: Review top 20 workloads for new optimization opportunities
Monthly: Full cluster re-analysis with updated 30-day utilization (Atmosly automates)
After Major Releases: Re-baseline if application behavior changes significantly
Seasonal Adjustments: Update safety buffers before known traffic spikes (Black Friday, tax season)

Expected Results and ROI

Typical Savings by Resource Type

Resource	Typical Over-Prov	Achievable Reduction	Example Savings (100 pods)
CPU	3-5x	40-60%	$12,000-$18,000/month
Memory	2-4x	30-50%	$6,000-$12,000/month
Total	—	35-55%	$18K-$30K/month

Time Investment vs Savings

Manual Approach (Open Source Tools):

Initial setup: 40-60 hours (Prometheus, VPA, custom dashboards)
Monthly analysis: 20-30 hours (PromQL queries, spreadsheet analysis, YAML generation)
Timeline to savings: 6-8 weeks
Total: 100+ hours for first month

Atmosly Approach:

Initial setup: 15 minutes (cluster import)
Monthly analysis: 2 hours (review recommendations, approve changes)
Timeline to savings: 1 week
Total: 2 hours for first month (98% less effort)

Common Pitfalls and Solutions

Pitfall 1: Using Average Instead of Percentiles

Problem: Setting requests based on average under-provisions during 50% of operating time.

Solution: Always use P95 (standard services) or P99 (critical services) plus 20-30% safety buffer.

Pitfall 2: Ignoring Seasonal Patterns

Problem: Rightsizing based on current 30 days fails when Black Friday or tax season hits.

Solution: Include peak periods in analysis window, or add growth buffers for rapidly scaling services, re-baseline after major launches.

Pitfall 3: Optimizing Everything Simultaneously

Problem: Applying changes to 100+ services at once makes root cause impossible if issues occur.

Solution: Phased rollout—staging first (week 1), non-critical prod (week 2-3), critical services conservatively (week 4-5).

Pitfall 4: Setting CPU Limits Too Low

Problem: Aggressive CPU limits cause throttling even when node has spare capacity.

Solution: Set CPU limits 2-3x requests allowing burst. Consider removing CPU limits entirely for non-critical workloads (limits don't affect scheduling).

Conclusion: Rightsizing as Continuous Discipline

Right-sizing Kubernetes workloads delivers 35-55% cost reduction when done systematically with data-driven analysis rather than guesswork. The key success factors: collecting 30+ days of Prometheus metrics capturing weekly and monthly patterns, analyzing at appropriate percentiles (P95/P99) with workload-specific safety buffers, validating thoroughly in staging before production, rolling out progressively monitoring SLIs closely, and re-analyzing continuously as applications evolve.

Organizations leveraging Atmosly to automate Prometheus analysis achieve these results in 1-2 weeks with 98% less engineering effort compared to 6-8 weeks manual approach. Atmosly connects to existing monitoring infrastructure, requires no new agents or sidecars, and delivers recommendations within 2 hours of cluster import.

Related Guides

Kubernetes FinOps: Complete Guide for Engineering Teams — the FinOps framework rightsizing fits into.
How to Fix Kubernetes OOMKilled Errors — avoid the failure mode of cutting memory too aggressively.
Atmosly Cost Intelligence — automated P95-based rightsizing recommendations with safety buffers.

Ready to rightsize your Kubernetes workloads? Import your cluster into Atmosly and get rightsizing recommendations based on your actual Prometheus data in 2 hours. Free trial, no credit card required.

Questions about rightsizing methodology? Schedule a consultation with our solutions team to review your specific workload patterns and optimization potential.

Frequently Asked Questions

How frequently should Kubernetes workloads be re-analyzed and rightsized as applications evolve?

Kubernetes workloads should be re-analyzed monthly using the latest 30 days of Prometheus data to keep resource recommendations aligned with current usage. For rapidly growing applications or after major releases, bi-weekly or immediate re-analysis is recommended to maintain performance and cost efficiency.

What if workloads experience unpredictable traffic spikes not captured in 30-day historical analysis?

For unpredictable traffic spikes, combine rightsizing with Horizontal Pod Autoscaling (HPA) and maintain additional resource buffers. Setting requests based on P95 usage and allowing HPA to scale replicas ensures workloads can handle sudden demand without performance degradation.

Can rightsizing recommendations be automatically applied or do they require manual implementation and validation?

Rightsizing recommendations should be reviewed and validated by engineers before production deployment. A staged rollout with performance monitoring helps prevent issues such as CPU throttling, OOMKills, or increased latency while ensuring safe resource optimization.

What Prometheus metrics and retention period are required for accurate rightsizing recommendations?

Accurate recommendations require key CPU, memory, resource request/limit, throttling, and OOMKill metrics collected by Prometheus. A minimum of 14 days of data is needed, while 30 days of retention is strongly recommended to capture workload patterns and seasonal usage trends.