Engineer reviewing Atmosly dashboard showing Kubernetes rightsizing recommendations with P95/P99 analysis and cost impact

Right-Sizing Kubernetes Workloads: Data-Driven Approach

Comprehensive guide to right-sizing Kubernetes workloads using Prometheus telemetry: percentile analysis, safety buffers, validation methodology, and how Atmosly automates 30-day utilization analysis for instant recommendations.

Introduction: The Right-Sizing Challenge

Right-sizing Kubernetes workloads—configuring CPU and memory resource requests and limits that align with actual application consumption—represents the fastest path to structural cost savings in container environments. Yet most organizations operate with massive over-provisioning: average request-to-usage ratios of 3-5x for CPU and 2-4x for memory. This means for every CPU core or gigabyte of memory actually consumed by applications, teams have requested and pay for three to five times that amount, resulting in billions of wasted CPU milliseconds and terabytes of unused memory capacity.

The over-provisioning epidemic stems from multiple causes: developers request resources conservatively "to be safe" without measuring actual needs, copy-paste inheritance from example manifests designed for different workload profiles, lack of production telemetry during initial deployment forcing engineers to guess, organizational risk aversion preferring waste over potential performance issues, and absence of feedback loops showing teams the financial impact of their resource requests. When compounded across hundreds of microservices and thousands of pods, cumulative waste reaches staggering proportions.

However, aggressive under-sizing poses equally serious risks: insufficient CPU causes throttling degrading latency and throughput, inadequate memory triggers OOMKilled terminations interrupting service, under-provisioned requests lead to poor scheduler decisions placing pods on nodes lacking actual capacity, and overly tight limits prevent handling legitimate traffic spikes. The key challenge is finding balance: resource configurations providing necessary safety buffers while eliminating structural waste.

This comprehensive guide provides a systematic, data-driven methodology for safely rightsizing Kubernetes workloads at scale using Prometheus for telemetry collection and platforms like Atmosly that automate the analysis. You'll learn how to collect 30 days of accurate utilization data, analyze usage at appropriate percentiles (P95/P99) rather than misleading averages, calculate recommended requests and limits with workload-specific safety buffers, validate changes through testing before production rollout, implement gradually while monitoring impact, and establish continuous re-analysis as applications evolve.

Understanding Kubernetes Resource Management

Requests vs Limits: The Two-Tier Model

Kubernetes uses a dual-tier resource management system that many engineers misunderstand:

Resource Requests (Scheduling Guarantee):

  • Define the minimum resources that Kubernetes guarantees the container
  • Scheduler uses requests for pod placement decisions—only schedules pods on nodes with sufficient unreserved capacity
  • Requests do NOT limit consumption—containers can use more if node capacity available
  • Determine pod Quality of Service (QoS) class affecting eviction priority during resource pressure
  • Cloud billing based on node capacity, but over-requesting drives unnecessary node scale-out increasing infrastructure spend

Resource Limits (Hard Ceiling):

  • Define the maximum resources a container may consume
  • CPU limits enforced through throttling—kernel pauses execution if container exceeds limit
  • Memory limits enforced through OOMKill—kubelet terminates container immediately if it exceeds memory limit
  • Limits do NOT affect scheduling—scheduler ignores limits when placing pods
  • Setting limits too low causes artificial performance degradation even when node has spare capacity

Example Configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-api
spec:
  replicas: 10
  template:
    spec:
      containers:
      - name: api
        image: payment-api:v2.4
        resources:
          requests:
            cpu: 500m        # Scheduler guarantees 0.5 CPU cores
            memory: 1Gi      # Scheduler guarantees 1 GB RAM
          limits:
            cpu: 1000m       # Maximum 1 CPU core (throttled if exceeded)
            memory: 2Gi      # Maximum 2 GB RAM (OOMKilled if exceeded)

Quality of Service (QoS) Classes

Resource configuration determines pod QoS class affecting eviction priority during node pressure:

QoS ClassConfigurationEviction PriorityUse Case
GuaranteedRequests = Limits for all resourcesLowest (last to evict)Critical production services requiring predictable performance
BurstableRequests < Limits (or only requests set)MediumMost production workloads—can burst during peaks but don't need guarantees
BestEffortNo requests or limits definedHighest (first to evict)Non-critical batch jobs tolerating interruption

Common Misconfiguration Patterns

Anti-Pattern 1: Massive Over-Requesting

# BAD: Developer "playing it safe" without data
resources:
  requests:
    cpu: 4000m      # 4 full CPU cores requested
    memory: 8Gi     # 8 GB RAM requested
  limits:
    cpu: 8000m
    memory: 16Gi

# Reality: Actual P99 usage is 450m CPU, 950Mi memory
# Result: 8.9x CPU over-provisioned, 8.6x memory over-provisioned
# Cost: $890/month per pod vs $98/month if properly sized
# Waste: $792/month per pod × 10 replicas = $7,920/month wasted

Anti-Pattern 2: No Resource Requests (BestEffort)

# BAD: No resource management
resources: {}  # No requests or limits

# Impact: Pod gets BestEffort QoS (evicted first during pressure)
# Risk: Unpredictable performance, scheduler places on any node regardless of capacity
# Problem: No cost attribution (can't track which team consumed resources)

Anti-Pattern 3: Limits Without Requests

# SUBOPTIMAL: Only limits defined
resources:
  limits:
    cpu: 2000m
    memory: 4Gi
  # No requests specified

# Behavior: Kubernetes automatically sets requests = limits
# Result: Forces Guaranteed QoS when Burstable would be more appropriate
# Impact: Wastes resources that could be shared across workloads

Phase 1: Collect 30 Days of Prometheus Telemetry

Prerequisites: Prometheus and Metrics Exporters

Right-sizing requires historical utilization data from Prometheus with these components:

  • kube-state-metrics: Exposes resource requests, limits, and pod metadata
  • node-exporter: Provides node-level CPU, memory, disk metrics
  • cAdvisor: Embedded in kubelet, exposes container-level usage metrics

If Prometheus not deployed, install with 30-day retention minimum:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus \
  --namespace monitoring \
  --create-namespace \
  --set server.retention=30d \
  --set server.persistentVolume.size=100Gi

Essential Metrics for Rightsizing

CPU Metrics (per container):

# Actual CPU usage (cores per second)
container_cpu_usage_seconds_total

# CPU requests (what was requested)
kube_pod_container_resource_requests{resource="cpu"}

# CPU limits
kube_pod_container_resource_limits{resource="cpu"}

# CPU throttling (indicates limits too low)
container_cpu_cfs_throttled_seconds_total
container_cpu_cfs_periods_total

Memory Metrics (per container):

# Actual memory usage (working set, excludes cache)
container_memory_working_set_bytes

# Memory requests
kube_pod_container_resource_requests{resource="memory"}

# Memory limits
kube_pod_container_resource_limits{resource="memory"}

# OOMKill events (indicates limits too low)
kube_pod_container_status_terminated_reason{reason="OOMKilled"}

Data Collection Best Practices

1. Minimum 14-Day Lookback (30 Days Recommended)

Collect at least two full business weeks to capture:

  • Weekday vs weekend traffic patterns
  • Peak load periods (morning login rush, nightly batch processing)
  • Weekly cycles (Monday traffic spikes, Friday drop-offs)
  • Monthly patterns (end-of-month reporting, payroll processing)

2. Scrape Interval: 15-60 Seconds

# Prometheus scrape configuration
scrape_configs:
  - job_name: 'kubernetes-pods'
    scrape_interval: 30s      # Balance granularity vs storage
    scrape_timeout: 10s
    kubernetes_sd_configs:
      - role: pod

3. Segment by Workload Criticality

# Label pods by criticality tier for differentiated safety buffers
kubectl label deployment payment-api tier=tier1-critical
kubectl label deployment analytics-batch tier=tier3-batch
kubectl label deployment internal-dashboard tier=tier2-standard

Phase 2: Analyze at Appropriate Percentiles

Why Percentiles Matter More Than Averages

Using average utilization for rightsizing is a critical mistake causing under-provisioning and performance degradation:

MetricValueIf Used for Sizing
Average CPU320mUnder-provisioned 50% of time (throttling)
P50 (median)380mUnder-provisioned 50% of time
P95680mUnder-provisioned 5% of time (acceptable)
P991150mUnder-provisioned 1% of time (conservative)
Max2400mOver-provisioned (may be anomaly/restart spike)

If you set CPU requests to 320m (average), application will be under-provisioned during 50% of operating time causing throttling and latency degradation. If you set to 2400m (max), you're over-provisioning for potentially anomalous spikes.

Recommended Percentile Targets by Workload Type

Workload TypeTarget PercentileSafety BufferRationale
Tier-1 Critical (payments, auth)P9925-30%Minimize any performance risk
User-Facing Services (APIs, web)P9520-25%Balance cost and acceptable occasional throttling
Internal Services (dashboards)P9015-20%Cost-optimized, tolerate occasional degradation
Batch/Background JobsP8510-15%Aggressive optimization, throughput over latency

Prometheus Queries for Percentile Analysis

# Calculate P95 CPU usage over 30 days for payment-api
quantile_over_time(0.95,
  rate(container_cpu_usage_seconds_total{
    pod=~"payment-api-.*",
    container="api"
  }[5m])[30d:5m]
)

# Calculate P99 memory usage over 30 days
quantile_over_time(0.99,
  container_memory_working_set_bytes{
    pod=~"payment-api-.*",
    container="api"
  }[30d:1m]
)

# Identify top 20 over-provisioned workloads (request/usage ratio)
topk(20,
  kube_pod_container_resource_requests{resource="cpu"} /
  on(pod,container) rate(container_cpu_usage_seconds_total[7d])
)

Phase 3: Automated Analysis with Atmosly

Manual Analysis Pain Points

Rightsizing 100+ workloads manually requires:

  • 4-8 hours writing PromQL queries per workload
  • Exporting data to spreadsheets for percentile calculations
  • Manually determining appropriate safety buffers per workload type
  • Calculating cost impact using AWS/GCP pricing data
  • Generating Kubernetes YAML patches for each service
  • Total effort: 40-60 hours for 100 workloads (monthly re-analysis needed)

How Atmosly Automates Rightsizing

Instead of manual Prometheus analysis, Atmosly connects to existing Prometheus and automates the entire workflow:

Step 1: Cluster Import (15 minutes)

# Import cluster via Atmosly UI
1. Navigate to: Clusters → Import Cluster
2. Select: AWS EKS / GCP GKE / Azure AKS / Other Kubernetes
3. Provide: Cluster name, region, credentials
4. Atmosly validates: Kubernetes API access, Prometheus connectivity
5. Auto-discovery: Finds Prometheus service endpoint automatically

Step 2: Automatic 30-Day Data Ingestion (2 hours)

  • Atmosly connects to Prometheus service (prometheus-server.monitoring)
  • Pulls 30 days of container CPU and memory metrics
  • Collects resource requests, limits, and pod metadata from kube-state-metrics
  • Identifies workload types (Deployment, StatefulSet, Job, CronJob, DaemonSet)
  • Classifies by criticality based on namespace labels and naming patterns

Step 3: ML-Powered Recommendation Generation

Atmosly's algorithm automatically:

  1. Calculates P50, P90, P95, P99, and max usage for each container
  2. Applies workload-specific percentile targets (P95 for standard, P99 for critical)
  3. Adds appropriate safety buffers (20-30% headroom above target percentile)
  4. Enforces minimum thresholds (100m CPU, 128Mi memory) preventing under-provisioning
  5. Calculates burst capacity limits (typically 2x requests)
  6. Computes current monthly cost vs recommended monthly cost
  7. Ranks recommendations by dollar savings potential

Step 4: Dashboard with Actionable Recommendations

WorkloadContainerCurrent CostOptimized CostSavingsOver-Prov %
payment-apiapi$24.50/mo$6.85/mo$17.6572%
analytics-jobprocessor$18.20/mo$4.20/mo$14.0077%
prometheus-grafanagrafana$10.96/mo$0.66/mo$10.3094%

Total Time: 2 hours (vs 40-60 hours manual)

Phase 4: Calculate Recommendations with Safety Buffers

Recommendation Formula

For CPU Requests:

Recommended_CPU_Request = P95_CPU_Usage × (1 + Safety_Buffer)

Example (standard tier-2 service):
  P95 CPU usage over 30 days: 680m
  Safety buffer: 20%
  Recommended CPU request: 680m × 1.20 = 816m (round to 850m)

For Memory Requests:

Recommended_Memory_Request = P95_Memory_Usage × (1 + Safety_Buffer)

Example:
  P95 Memory usage: 950Mi
  Safety buffer: 20%
  Recommended memory request: 950Mi × 1.20 = 1140Mi (round to 1200Mi)

For Limits (Burst Capacity):

Recommended_CPU_Limit = Recommended_CPU_Request × Burst_Multiplier

Typical burst multipliers:
  - Stateless APIs: 2.0x (allow 2x burst)
  - Databases: 1.5x (less burstable, consistent usage)
  - Batch jobs: 3.0x (highly variable, benefit from burst)

Example:
  Recommended request: 850m
  Burst multiplier: 2.0x
  Recommended limit: 1700m

Guardrails Preventing Under-Provisioning

# Minimum thresholds (prevent aggressive under-sizing)
MINIMUM_CPU_REQUEST = 100m      # Prevent excessive throttling
MINIMUM_MEMORY_REQUEST = 128Mi  # Prevent OOMKills

# Maximum single reduction (staged rollout)
MAX_REDUCTION_PERCENTAGE = 50%  # Don't cut more than 50% at once

# Exclusion criteria
EXCLUDE_IF:
  - Label: cost-optimization=disabled
  - Label: rightsizing=manual
  - Insufficient data: < 7 days metrics
  - High variance: stddev > 2x mean

Real Example: payment-api Container

# BEFORE (Over-Provisioned)
resources:
  requests:
    cpu: 2000m        # 2 full CPU cores
    memory: 4Gi       # 4 GB RAM
  limits:
    cpu: 4000m
    memory: 8Gi

# ATMOSLY 30-DAY ANALYSIS
P50 CPU: 320m | P95 CPU: 680m | P99 CPU: 1150m | Max: 1680m
P50 Mem: 720Mi | P95 Mem: 950Mi | P99 Mem: 1280Mi | Max: 1520Mi

Workload tier: tier-2 (standard)
Target percentile: P95
Safety buffer: 20%

# RECOMMENDED (Data-Driven)
resources:
  requests:
    cpu: 850m         # 680m P95 + 25% = 850m (58% reduction)
    memory: 1200Mi    # 950Mi P95 + 26% = 1200Mi (71% reduction)
  limits:
    cpu: 1700m        # 2x requests for burst
    memory: 2400Mi    # 2x requests for burst

# COST IMPACT
Current cost: $24.50/month per pod
Recommended cost: $6.85/month per pod
Monthly savings: $17.65 per pod
Cluster savings (10 replicas): $176.50/month
Annual savings: $2,118/year for this one service

Phase 5: Validate Before Production

Staging Environment Testing

  1. Apply recommendations to staging first
  2. Run load tests mimicking production traffic
  3. Monitor SLIs closely: P95/P99 latency, error rate, CPU throttling, OOMKills
  4. Validate for 48-72 hours before production
# Load test with k6
k6 run --vus 500 --duration 30m load-test.js

# Monitor during load test
kubectl top pods -n staging | grep payment-api

# Check for throttling (should be < 5%)
kubectl exec -n staging prometheus-0 -- promtool query instant \
  'sum(rate(container_cpu_cfs_throttled_seconds_total{pod=~"payment-api-.*"}[5m])) /
   sum(rate(container_cpu_cfs_periods_total{pod=~"payment-api-.*"}[5m]))'

Progressive Production Rollout

Wave 1: 10% of Replicas (Week 1)

# Update 1 of 10 pods
kubectl patch deployment payment-api --type=json -p='[
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "850m"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "1200Mi"}
]'

# Monitor for 48 hours
# Compare metrics between old (2000m/4Gi) and new (850m/1200Mi) pods

Wave 2: 50% of Replicas (Week 2, if Wave 1 successful)

Wave 3: 100% Rollout (Week 3, if Wave 2 successful)

Phase 6: Monitor and Iterate

Post-Deployment Monitoring

Critical SLIs to Track:

MetricTargetAlert ThresholdAction if Exceeded
P95 LatencyNo increase>10% increaseRollback immediately
Error Rate< 0.1%>0.15%Investigate, consider rollback
CPU Throttling< 5%>10%Increase CPU request 20%
OOMKill EventsZeroAny OOMKillIncrease memory request 30%
Pod Restart Rate< 1/day>3/dayRollback and investigate

Continuous Re-Analysis Cadence

  • Weekly: Review top 20 workloads for new optimization opportunities
  • Monthly: Full cluster re-analysis with updated 30-day utilization (Atmosly automates)
  • After Major Releases: Re-baseline if application behavior changes significantly
  • Seasonal Adjustments: Update safety buffers before known traffic spikes (Black Friday, tax season)

Expected Results and ROI

Typical Savings by Resource Type

ResourceTypical Over-ProvAchievable ReductionExample Savings (100 pods)
CPU3-5x40-60%$12,000-$18,000/month
Memory2-4x30-50%$6,000-$12,000/month
Total35-55%$18K-$30K/month

Time Investment vs Savings

Manual Approach (Open Source Tools):

  • Initial setup: 40-60 hours (Prometheus, VPA, custom dashboards)
  • Monthly analysis: 20-30 hours (PromQL queries, spreadsheet analysis, YAML generation)
  • Timeline to savings: 6-8 weeks
  • Total: 100+ hours for first month

Atmosly Approach:

  • Initial setup: 15 minutes (cluster import)
  • Monthly analysis: 2 hours (review recommendations, approve changes)
  • Timeline to savings: 1 week
  • Total: 2 hours for first month (98% less effort)

Common Pitfalls and Solutions

Pitfall 1: Using Average Instead of Percentiles

Problem: Setting requests based on average under-provisions during 50% of operating time.

Solution: Always use P95 (standard services) or P99 (critical services) plus 20-30% safety buffer.

Pitfall 2: Ignoring Seasonal Patterns

Problem: Rightsizing based on current 30 days fails when Black Friday or tax season hits.

Solution: Include peak periods in analysis window, or add growth buffers for rapidly scaling services, re-baseline after major launches.

Pitfall 3: Optimizing Everything Simultaneously

Problem: Applying changes to 100+ services at once makes root cause impossible if issues occur.

Solution: Phased rollout—staging first (week 1), non-critical prod (week 2-3), critical services conservatively (week 4-5).

Pitfall 4: Setting CPU Limits Too Low

Problem: Aggressive CPU limits cause throttling even when node has spare capacity.

Solution: Set CPU limits 2-3x requests allowing burst. Consider removing CPU limits entirely for non-critical workloads (limits don't affect scheduling).

Conclusion: Rightsizing as Continuous Discipline

Right-sizing Kubernetes workloads delivers 35-55% cost reduction when done systematically with data-driven analysis rather than guesswork. The key success factors: collecting 30+ days of Prometheus metrics capturing weekly and monthly patterns, analyzing at appropriate percentiles (P95/P99) with workload-specific safety buffers, validating thoroughly in staging before production, rolling out progressively monitoring SLIs closely, and re-analyzing continuously as applications evolve.

Organizations leveraging Atmosly to automate Prometheus analysis achieve these results in 1-2 weeks with 98% less engineering effort compared to 6-8 weeks manual approach. Atmosly connects to existing monitoring infrastructure, requires no new agents or sidecars, and delivers recommendations within 2 hours of cluster import.

Related Guides

Ready to rightsize your Kubernetes workloads? Import your cluster into Atmosly and get rightsizing recommendations based on your actual Prometheus data in 2 hours. Free trial, no credit card required.

Questions about rightsizing methodology? Schedule a consultation with our solutions team to review your specific workload patterns and optimization potential.

Frequently Asked Questions

How frequently should Kubernetes workloads be re-analyzed and rightsized as applications evolve?
Kubernetes workloads should be re-analyzed monthly using the latest 30 days of Prometheus data to keep resource recommendations aligned with current usage. For rapidly growing applications or after major releases, bi-weekly or immediate re-analysis is recommended to maintain performance and cost efficiency.
What if workloads experience unpredictable traffic spikes not captured in 30-day historical analysis?
For unpredictable traffic spikes, combine rightsizing with Horizontal Pod Autoscaling (HPA) and maintain additional resource buffers. Setting requests based on P95 usage and allowing HPA to scale replicas ensures workloads can handle sudden demand without performance degradation.
Can rightsizing recommendations be automatically applied or do they require manual implementation and validation?
Rightsizing recommendations should be reviewed and validated by engineers before production deployment. A staged rollout with performance monitoring helps prevent issues such as CPU throttling, OOMKills, or increased latency while ensuring safe resource optimization.
What Prometheus metrics and retention period are required for accurate rightsizing recommendations?
Accurate recommendations require key CPU, memory, resource request/limit, throttling, and OOMKill metrics collected by Prometheus. A minimum of 14 days of data is needed, while 30 days of retention is strongly recommended to capture workload patterns and seasonal usage trends.