By Atmosly Team November 4, 2025 Kubernetes

Kubernetes Metrics: What to Monitor and Why (2025)

Q: What are the Four Golden Signals for Kubernetes monitoring?

Latency: Measures how fast requests are served, typically tracked using percentiles such as p50 , p95 , and p99 . 🔹 Example Alert: Trigger an alert if p95 latency exceeds the SLO threshold (e.g., 500 ms). Traffic: Represents demand measured in requests per second (RPS). It helps identify usage spikes or outages. 🔹 Example Alert: Alert if traffic drops more than 50% from baseline, indicating a possible outage. Errors: Tracks failure rate as a percentage, often focusing on 5xx responses or failed requests. 🔹 Example Alert: Alert if the error rate is sustained above 1% or exceeds the defined error budget. Saturation: Monitors how “full” system resources are — CPU, memory, and disk usage as a percentage of capacity. 🔹 Example Alert: Alert if CPU > 80% or memory > 90% sustained over time. Note: These Four Golden Signals from Google SRE focus on user-facing impact and business health — not just infrastructure state. Implement these first before adding hundreds of other metrics.

Q: How does Atmosly's Cost Intelligence use Kubernetes metrics?

Resource waste with cost: Identifies over-provisioning such as “Pod uses 0.3 CPU but requests 2 CPU = 85% waste = $120/month per pod.” 💡 Recommendation: Reduce CPU requests to 400m to eliminate waste. Idle resources: Detects unused workloads like “Staging runs 24/7 but receives zero traffic outside business hours = $450/month waste.” 💡 Recommendation: Schedule automatic shutdowns during off-hours. Right-sizing: Analyzes 7–30 day resource usage and recommends optimal requests and limits based on cost-performance tradeoffs. 💡 Outcome: Balances reliability with cost efficiency. Memory leak detection with cost: Detects trends such as “Memory increasing +20Mi/hour = leak costing $40/month in over-provisioning.” 💡 Action: Identify and fix memory leaks to reduce long-term cost. Cost per namespace/deployment/pod: Provides chargeback and cost visibility across environments for accurate team-level accountability. Note: Traditional tools only show raw Kubernetes metrics. Atmosly correlates metrics with actual cloud billing data and provides AI-driven cost optimization recommendations — helping teams achieve up to 30% savings .

Q: What is the difference between Kubernetes resource requests and limits?

Requests = Guaranteed minimum resources: Scheduler uses requests to decide node placement — a pod is only scheduled if the node has enough available capacity. The container always receives at least the requested resources, which also determines its QoS (Quality of Service) class. Limits = Maximum resources container can use: If CPU usage exceeds the limit, it is throttled, causing possible performance degradation. If memory usage exceeds the limit, the container is OOMKilled (terminated). Best Practice: Set both requests and limits . - Base requests on average usage. - Set limits 20–30% higher to handle traffic spikes. Monitor usage vs requests: If actual usage is consistently below 30% of requested resources, your workloads are over-provisioned — wasting money. Atmosly Optimization: Atmosly automatically identifies over-provisioned pods and quantifies the cost impact to help you right-size resources efficiently.

Q: What Kubernetes metrics should trigger alerts versus dashboard-only visibility?

Error rate > 1% sustained: Indicates users are impacted immediately — requires urgent investigation and remediation. Latency p99 exceeds SLO: Signals poor user experience and possible service degradation. Memory usage > 90% of limit: Imminent OOMKill risk — requires right-sizing or memory optimization. Disk usage > 85%: Approaching exhaustion; cleanup or expansion needed to prevent downtime. CrashLoopBackOff: A container or service is continuously crashing, causing degraded functionality. API server latency p95 > 1s: Indicates control plane overload affecting cluster responsiveness. etcd fsync > 100ms p99 or frequent leader changes: Represents potential cluster stability risks. Deployment replicas unavailable > 0 for over 5 minutes: Suggests degraded redundancy or availability issues. Dashboard-only metrics (no alerts): CPU 50–70%: Normal operating range — no action required. Single pod restart: Kubernetes self-healing, not an alert condition. Memory cache usage: Typically reclaimable, not critical. Traffic pattern variations: Normal if within baseline range. Key Principle: Every alert must require human action — otherwise, it should not exist. Atmosly Advantage: Atmosly leverages machine learning to learn real-time baselines and only triggers alerts on true anomalies with user impact , reducing alert noise by up to 80% .

Q: What are Prometheus recording rules and when to use them?

Purpose: Recording rules pre-compute expensive Prometheus queries periodically and store the results as new metrics, enabling faster dashboards and alerts. Use recording rules when: Query is used frequently ( >10 times/hour ). Query is computationally expensive ( >1 second to compute). Dashboards timeout due to complex queries. Alerts require sub-second evaluation speed. Example: Pre-compute p99 latency every 30 seconds instead of recalculating it on every dashboard load: record: job:http_duration:p99 expr: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (job, le)) Benefits: 10–100x faster queries. Reduced Prometheus load and resource consumption. Consistent calculations across dashboards and alerts. Trade-off: Increases storage usage due to storing pre-computed metrics. Best Practice: Use recording rules only for high-frequency or high-cost queries — not for queries run once per day.

Four Golden Signals (Latency, Traffic, Errors, Saturation), learn essential node/pod/control plane metrics, implement SLOs with Prometheus, optimize costs using resource metrics, and discover how Atmosly's Cost Intelligence correlates performance with cloud spend.

Introduction to Kubernetes Metrics: From Overwhelming Data to Actionable Insights

Kubernetes environments generate an absolutely staggering volume of metrics. A medium-sized production cluster with just 100 nodes and 2,000 pods can easily expose 500,000+ unique time-series metrics, with each metric being scraped and stored every 15-30 seconds, resulting in millions of data points flowing into your monitoring system every single minute. For engineers new to Kubernetes monitoring, or even experienced SREs dealing with their first large-scale production cluster, this overwhelming tsunami of data creates severe analysis paralysis and decision fatigue.

The fundamental question becomes: Which specific metrics actually matter for maintaining service reliability and meeting SLOs? What should trigger immediate PagerDuty alerts that wake engineers at 3 AM versus just passive dashboard visibility reviewed during business hours? How do you distinguish between normal operational variation that requires no action and genuine performance degradation or impending capacity problems that require urgent intervention before they impact customers? Which metrics indicate cost waste over-provisioned resources burning money that could be optimized to reduce cloud bills by 30-40% without affecting performance?

The challenge in modern Kubernetes observability isn't a lack of available metrics, quite the opposite. Modern clusters with Prometheus, kube-state-metrics, node-exporter, cAdvisor, application-level instrumentation via client libraries, and service mesh telemetry from Istio or Linkerd expose hundreds of thousands of distinct data points covering every conceivable operational aspect, from CPU nanoseconds consumed by individual containers to network packet drop rates on specific interfaces to filesystem inode exhaustion on volume mounts to API server request latencies broken down by verb and resource type.

The real challenge is filtering actionable signals from meaningless noise, understanding which metrics provide insights that drive concrete actions and decisions, avoiding alert fatigue from false positives while catching real issues before customer impact, and using metrics proactively for optimization rather than just reactive troubleshooting after incidents occur.

This comprehensive guide teaches you exactly which Kubernetes metrics deserve your monitoring attention and why. We cover: the essential four-layer metric hierarchy from infrastructure to business impact, Google's Four Golden Signals framework adapted for Kubernetes, implementing SLOs with metrics, critical control plane health indicators, resource utilization metrics impacting costs, effective alerting strategies, and how Atmosly's Cost Intelligence uniquely correlates performance metrics with cloud billing for optimization.

The Kubernetes Metrics Hierarchy: Four Essential Layers

Layer 1: Infrastructure and Node Metrics (The Foundation)

Infrastructure metrics track the physical or virtual machines that provide compute, memory, storage, and networking for Kubernetes. Without healthy nodes with available capacity, nothing else can function.

Critical Node CPU Metrics:

node_cpu_seconds_total{mode="idle"}: Time CPU spent idle. Calculate utilization: (1 - rate(node_cpu_seconds_total{mode="idle"}[5m])) × 100. Alert when sustained > 80% across cluster (capacity constraint approaching).
node_cpu_seconds_total{mode="system"}: Kernel time. A high system time (>30%) indicates excessive context switches or I/O wait due to kernel operations.
node_cpu_seconds_total{mode="iowait"}: CPU idle waiting for I/O. High iowait (>20%) indicates disk bottleneck—applications blocked on slow storage.
node_load1, node_load5, node_load15: CPU load averages. Load > CPU core count means processes queuing for CPU time (saturation).

Critical Node Memory Metrics:

node_memory_MemTotal_bytes: Total physical RAM (constant)
node_memory_MemAvailable_bytes: Most important! Memory available for new allocations without swapping. Kubernetes scheduler uses this. Alert when < 10% of total (cannot schedule new pods).
node_memory_MemFree_bytes: Completely unused memory (typically small—Linux uses free memory for cache).
node_memory_Cached_bytes: Filesystem cache memory (reclaimable when apps need it).

Why MemAvailable > MemFree: Linux uses "free" memory for filesystem cache improving I/O performance. Cache can be dropped instantly when applications need memory. MemAvailable = MemFree + reclaimable cache. Always monitor MemAvailable, not MemFree.

Node Disk Metrics:

node_filesystem_avail_bytes: Available disk space per mount point. Alert at 85% full (need time to expand or cleanup).
node_filesystem_files_free: Available inodes. Can run out even with disk space free! Alert at 90% inode usage.
node_disk_io_time_seconds_total: Disk I/O time indicates saturation. High I/O time with high iowait = disk bottleneck.
node_disk_read_bytes_total / node_disk_written_bytes_total: Disk throughput for capacity planning.

Node Network Metrics:

node_network_receive_bytes_total: Inbound network traffic by interface
node_network_transmit_bytes_total: Outbound network traffic
node_network_receive_drop_total: Dropped packets indicate saturation or errors

Atmosly's Node Cost Correlation: Atmosly correlates node metrics with cloud billing: "Node i-abc123 averaged 25% CPU for 30 days (16 cores × 75% idle = 12 wasted cores = $320/month waste). Recommendation: Downsize to c5.2xlarge (8 cores) saving $240/month."

Layer 2: Kubernetes Orchestration Metrics (Cluster State)

These metrics from kube-state-metrics track Kubernetes resource states, indicating whether Kubernetes successfully manages workloads or encounters issues.

Pod Status Metrics:

kube_pod_status_phase{phase="Pending|Running|Succeeded|Failed|Unknown"}: Current pod phase. Alert on Failed or Unknown. Monitor Pending > 5 minutes (scheduling issues).
kube_pod_container_status_ready: Container readiness (0=not ready, 1=ready). Service only sends traffic to ready pods.
kube_pod_container_status_restarts_total: Container restart count. Alert if rate > 0 for 10 minutes (indicates CrashLoopBackOff).
kube_pod_container_status_terminated_reason: Why terminated ("OOMKilled", "Error", "Completed"). Critical for root cause analysis.
kube_pod_status_scheduled_time: Timestamp when pod scheduled. Calculate scheduling latency.

Deployment Metrics:

kube_deployment_status_replicas: Desired replica count from spec
kube_deployment_status_replicas_available: Currently available replicas (ready and passing health checks)
kube_deployment_status_replicas_unavailable: Unavailable replicas. Alert if > 0 for 5 minutes (degraded service).
kube_deployment_status_replicas_updated: Replicas running latest pod template (tracks rollout progress).

Service and Endpoint Metrics:

kube_service_spec_type: Service type (ClusterIP, NodePort, LoadBalancer)
kube_endpoint_address_available: Number of healthy endpoints behind service. Alert if 0 (service has no backends).

Resource Quota Metrics:

kube_resourcequota: Quota limits and usage per namespace. Alert when usage > 90% of quota (teams hitting limits).

Layer 3: Container Resource Utilization Metrics (Performance and Cost)

Container metrics show actual resource consumption versus allocation—where performance meets cost optimization.

Container CPU Metrics:

container_cpu_usage_seconds_total: Cumulative CPU time consumed. Calculate rate for current usage: rate(container_cpu_usage_seconds_total[5m])
container_cpu_cfs_throttled_seconds_total: Time container was CPU throttled (hit CPU limit). Throttling degrades performance. If high, increase CPU limits.
kube_pod_container_resource_requests{resource="cpu"}: CPU requested (for scheduling)
kube_pod_container_resource_limits{resource="cpu"}: CPU limit (throttling threshold)

CPU Utilization Calculation:

# CPU usage as % of request
(rate(container_cpu_usage_seconds_total[5m])
/
kube_pod_container_resource_requests{resource="cpu"})
* 100

# If < 30% consistently: Over-provisioned, wasting money
# If > 90%: Approaching throttling, may need more CPU

Container Memory Metrics:

container_memory_working_set_bytes: Most critical memory metric! This is what counts toward OOMKill limit. Actual memory used excluding reclaimable cache.
container_memory_rss: Resident Set Size (anonymous memory, no file backing). Subset of working set.
container_memory_cache: Page cache memory (reclaimable). Working_set = RSS + page cache - reclaimable.
container_memory_swap: Swap usage. Should be 0 (Kubernetes nodes shouldn't swap).
kube_pod_container_resource_requests{resource="memory"}: Memory request
kube_pod_container_resource_limits{resource="memory"}: Memory limit (OOMKill threshold)

Memory Utilization and OOMKill Risk:

# Memory usage as % of limit (OOMKill when hits 100%)
(container_memory_working_set_bytes
/
kube_pod_container_resource_limits{resource="memory"})
* 100

# Alert if > 90% (OOMKill imminent)
# Alert if > 95% for 2 minutes (critical - will OOMKill very soon)

Container Network Metrics:

container_network_receive_bytes_total: Inbound network traffic per container
container_network_transmit_bytes_total: Outbound network traffic
container_network_receive_packets_dropped_total: Dropped inbound packets (network saturation or errors)

Atmosly's Cost Intelligence with Resource Metrics:

Atmosly analyzes container resource metrics and calculates exact financial waste:

Payment Service Optimization Opportunity
Current Configuration:
CPU Request: 2000m (2 cores)
CPU Limit: 4000m (4 cores)
Memory Request: 4Gi
Memory Limit: 8Gi
Replicas: 3
Actual Usage (30-day analysis):
CPU p95: 450m (22.5% of request)
Memory p95: 1.2Gi (30% of request)
CPU never exceeded 600m
Memory never exceeded 1.5Gi
Cost Analysis:
Current cost: 3 pods × (2 CPU × $30/CPU + 4Gi × $4/Gi) = $216/month
Waste: 77.5% CPU + 70% memory unused but paid for
Wasted spend: $156/month
Atmosly Recommendation:
kubectl set resources deployment/payment-service \\
  --requests=cpu=600m,memory=1.5Gi \\
  --limits=cpu=1200m,memory=2.5Gi

# New cost: $72/month (savings: $144/month = 66% reduction)
# Performance impact: Zero (still above p95 usage + 30% headroom)
# Risk: Low (monitoring confirms usage patterns stable)

This direct cost-to-action correlation is impossible with traditional monitoring tools.

Layer 4: Application Metrics (Business Impact)

While infrastructure and Kubernetes metrics show HOW your system performs, application metrics show WHAT impact on users and business.

RED Metrics (Rate, Errors, Duration):

Request Rate: Requests per second. sum(rate(http_requests_total[5m])) by (service)
Error Rate: Failed requests as a percentage. sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100
Duration: Request latency percentiles. histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))

Custom Business Metrics:

Orders processed per minute (e-commerce)
Payments completed successfully (payment systems)
User signups per hour (SaaS applications)
API requests per customer (usage tracking)
Video encoding jobs completed (media processing)

Expose via Prometheus client libraries in your application code.

The Four Golden Signals: Google's SRE Framework for Kubernetes

Signal 1: Latency (How Fast?)

What to Monitor: Time to serve requests, measured in percentiles

Why Percentiles Matter More Than Averages:

Average latency is misleading. If 99% of requests complete in 100ms but 1% take 10 seconds, the average is still only 199ms—looks fine, but 1% of users suffer a terrible experience.

Monitor percentiles:

p50 (median): 50% of requests faster, 50% slower. Represents a typical user experience.
p95: 95% faster. Only 5% of users experience worse latency. Good SLO target.
p99: 99% faster. Catches tail latency affecting power users or edge cases.
p99.9: 99.9% faster. For very high scale (millions of requests), even 0.1% is many users.

Prometheus Query for Latency Percentiles:

# Calculate p95 latency for HTTP requests
histogram_quantile(0.95,
  sum(rate(http_request_duration_seconds_bucket{job="my-app"}[5m])) by (le, job)
)

# Alert if p95 latency > 500ms
- alert: HighLatency
  expr: |
    histogram_quantile(0.95,
      sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
    ) > 0.5
  for: 5m
  annotations:
    summary: "95th percentile latency is {{ $value | humanizeDuration }}"

Latency Best Practices:

Define SLO: "p95 latency < 300ms" not "average < 200ms"
Monitor separately for different endpoints (search vs checkout have different SLOs)
Track latency from the user perspective (end-to-end), not just backend processing time
Consider network latency between services in microservices

Signal 2: Traffic (How Much Demand?)

What to Monitor: Request volume per second

Prometheus Query:

# Requests per second over last 5 minutes
sum(rate(http_requests_total{job="my-app"}[5m]))

# By endpoint/path
sum(rate(http_requests_total[5m])) by (path)

# By status code
sum(rate(http_requests_total[5m])) by (status)

Why Monitor Traffic:

Sudden spikes: DDoS attack, viral content, or marketing campaign success (need to scale)
Sudden drops: Outage, integration failure, or upstream service down (critical alert)
Gradual growth: Capacity planning—when will current resources be insufficient?
Daily patterns: Baseline for anomaly detection

Traffic-Based Alerts:

# Alert if traffic drops >50% from baseline (potential outage)
- alert: TrafficDropped
  expr: |
    sum(rate(http_requests_total[5m]))
    <
    sum(rate(http_requests_total[5m] offset 1h)) * 0.5
  for: 5m
  annotations:
    summary: "Traffic dropped by {{ $value | humanizePercentage }} from 1h ago"

Signal 3: Errors (What's Failing?)

What to Monitor: Failed request rate as percentage of total requests

Prometheus Query:

# Error rate as percentage
(
  sum(rate(http_requests_total{status=~"5.."}[5m]))
  /
  sum(rate(http_requests_total[5m]))
) * 100

# Separate client errors (4xx) from server errors (5xx)
sum(rate(http_requests_total{status=~"4.."}[5m])) by (status)  # Client
sum(rate(http_requests_total{status=~"5.."}[5m])) by (status)  # Server

Error Rate Alert:

- alert: HighErrorRate
  expr: |
    (sum(rate(http_requests_total{status=~"5.."}[5m]))
    /
    sum(rate(http_requests_total[5m]))) > 0.01
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Error rate is {{ $value | humanizePercentage }}% (threshold: 1%)"

Distinguishing Error Types:

5xx errors: YOUR fault (server bugs, crashes, timeouts). Alert immediately.
4xx errors: Client's fault usually (bad requests, authentication). Monitor but don't usually alert unless spike indicates API breaking change.
Exceptions: 429 (rate limit) might indicate need to scale, 401/403 (auth) spikes might indicate attack.

Signal 4: Saturation (How Full Are Resources?)

What to Monitor: Resource utilization as percentage of capacity

Why Saturation Matters: Resources at 100% capacity cause performance degradation even if nothing is "broken." At 80-90% utilization, you need to scale before hitting 100%.

CPU Saturation Query:

# Container CPU saturation (usage vs limit)
(rate(container_cpu_usage_seconds_total[5m])
/
kube_pod_container_resource_limits{resource="cpu"})
* 100

# Alert if sustained > 80%

Memory Saturation Query:

# Memory usage vs limit (OOMKill risk)
(container_memory_working_set_bytes
/
kube_pod_container_resource_limits{resource="memory"})
* 100

# Alert if > 90% (OOMKill imminent)
# Alert if > 95% for 2 min (critical - will OOMKill soon)

Connection Pool Saturation:

# Database connection pool usage
(db_connection_pool_active_connections
/
db_connection_pool_max_connections)
* 100

# Alert if > 80% (connections exhausted soon)

Critical Kubernetes Control Plane Metrics

Control plane (API server, etcd, scheduler, controller-manager) must be healthy for Kubernetes to function. Control plane problems cascade into cluster-wide failures.

API Server Metrics (Most Critical)

Request Latency:

# API server request duration p95
histogram_quantile(0.95,
  sum(rate(apiserver_request_duration_seconds_bucket[5m])) by (le, verb)
)

# Alert if p95 > 1 second (API server overloaded)
# Slow API = slow everything (kubectl, controllers, scheduler)

Request Rate and Errors:

# API requests per second by verb (GET, LIST, CREATE, UPDATE, DELETE)
sum(rate(apiserver_request_total[5m])) by (verb)

# API server errors
sum(rate(apiserver_request_total{code=~"5.."}[5m]))

Inflight Requests:

# Current concurrent requests being processed
apiserver_current_inflight_requests

# Alert if approaching max capacity
# Indicates API server saturation

etcd Metrics (THE Most Critical - Cluster State Store)

etcd stores all cluster state. If etcd fails or becomes slow, the entire cluster fails.

Disk Sync Duration:

# etcd write-ahead log fsync duration p99
histogram_quantile(0.99,
  sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (le)
)

# Alert if p99 > 100ms
# Slow disk fsync = slow etcd = slow cluster
# Usually indicates disk I/O bottleneck - use faster SSD

Leader Elections:

# etcd leader changes
rate(etcd_server_leader_changes_seen_total[5m])

# Alert if > 0 (leader should be stable)
# Frequent leader changes indicate network issues or etcd cluster instability

Backend Commit Duration:

# Time to commit to backend database
histogram_quantile(0.99,
  sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (le)
)

# Alert if p99 > 200ms

Scheduler Metrics

Scheduling Attempts:

# Scheduling success vs failure rate
sum(rate(scheduler_schedule_attempts_total[5m])) by (result)

# Calculate failure rate
sum(rate(scheduler_schedule_attempts_total{result="error"}[5m]))
/
sum(rate(scheduler_schedule_attempts_total[5m]))
* 100

# Alert if failure rate > 5%

Pending Pods:

# Number of pods waiting to be scheduled
scheduler_pending_pods

# Alert if high and growing (insufficient cluster capacity)

Scheduling Duration:

# Time to schedule a pod
histogram_quantile(0.95,
  sum(rate(scheduler_scheduling_duration_seconds_bucket[5m])) by (le)
)

# Alert if p95 > 1 second (scheduling bottleneck)

Implementing Service-Level Objectives (SLOs) with Metrics

SLOs define acceptable service performance. Metrics measure if you're meeting SLOs.

Example SLO: E-Commerce Web Application

SLO Definition:

99.9% of requests complete in < 500ms (latency SLO)
99.99% of requests succeed without errors (availability SLO)
Measured over rolling 30-day window

SLI (Service Level Indicator) Metrics:

# Latency SLI: % of requests under 500ms threshold
sum(rate(http_request_duration_seconds_bucket{le="0.5"}[30d]))
/
sum(rate(http_request_duration_seconds_count[30d]))
* 100

# Target: > 99.9%

# Availability SLI: % of successful requests
sum(rate(http_requests_total{status!~"5.."}[30d]))
/
sum(rate(http_requests_total[30d]))
* 100

# Target: > 99.99%

Error Budget Calculation:

If your availability SLO is 99.9%, you have a 0.1% error budget.

Error budget = (1 - 0.999) × total requests
             = 0.001 × 10,000,000 requests/month
             = 10,000 allowed failed requests per month
             = 43 minutes downtime per month

Error Budget Burn Rate Alert:

# Alert if burning error budget too fast
# (will exhaust budget before month ends)

- alert: ErrorBudgetBurnRateCritical
  expr: |
    (
      (1 - sum(rate(http_requests_total{status!~"5.."}[1h]))
            / sum(rate(http_requests_total[1h])))
      /
      (1 - 0.999)  # SLO
    ) > 14.4  # 14.4x normal rate = exhaust in 2 days
  for: 5m
  annotations:
    summary: "Burning error budget 14x too fast"

Metric-Based Cost Optimization Strategies

Metrics aren't just for reliability—they're powerful tools for identifying and eliminating cloud cost waste.

Strategy 1: CPU Over-Provisioning Detection

Identify waste:

# Pods using < 30% of requested CPU consistently
(
  avg_over_time(
    (rate(container_cpu_usage_seconds_total[5m])
    /
    kube_pod_container_resource_requests{resource="cpu"})[7d:1h]
  )
) < 0.3

Calculate cost impact:

Wasted CPU cores × $30-50 per core per month (depends on cloud provider and instance type)

Atmosly automates this: "Deployment frontend-web has 10 pods requesting 2 CPU but using 0.4 CPU (80% waste). Total wasted: 16 CPU cores = $640/month. Recommendation: Reduce CPU request to 500m per pod."

Strategy 2: Memory Waste Identification

# Pods using < 50% of requested memory consistently
(
  avg_over_time(
    (container_memory_working_set_bytes
    /
    kube_pod_container_resource_requests{resource="memory"})[7d:1h]
  )
) < 0.5

Atmosly shows: "Database pods request 8Gi but use 3Gi (62% waste) = $180/month waste across 5 replicas. Reduce to 4Gi requests."

Strategy 3: Idle Resource Detection

Find pods with zero traffic:

# Pods receiving no requests in last hour
sum(rate(http_requests_total[1h])) by (pod) == 0

Atmosly identifies: "Dev environment pods running 24/7 cost $680/month but metrics show zero traffic nights/weekends (120 hours/week idle). Schedule Pod Disruption Budget to save $450/month."

Strategy 4: Right-Sizing Recommendations

Based on 7-30 day usage patterns, calculate optimal requests:

# 95th percentile CPU usage over 30 days
quantile_over_time(0.95,
  rate(container_cpu_usage_seconds_total[5m])[30d:5m]
)

# Recommendation: Set request = p95 usage × 1.3 (30% headroom for spikes)

Atmosly automates this analysis across ALL pods, calculates cost impact, and provides one-command kubectl fixes.

Best Practices for Kubernetes Metrics

1. Use Labels Wisely (Avoid High Cardinality)

Good labels (low cardinality):

namespace (tens of values)
deployment, service (hundreds)
container (thousands)
environment (3-5: dev, staging, prod)
version (10-20 active versions)

Bad labels (high cardinality):

user_id (millions of unique users) ❌
request_id (every request unique) ❌
email, IP addresses ❌

High cardinality explodes time-series count, exhausts Prometheus memory, and slows queries. Use logging for high-cardinality data.

2. Set Appropriate Scrape Intervals

Critical metrics: 15-30 seconds (latency, errors, pod status)
Resource metrics: 30-60 seconds (CPU, memory, disk)
Slow-changing: 2-5 minutes (storage capacity, version info)

More frequent = higher storage cost and query load. Balance freshness against resource usage.

3. Implement Recording Rules for Common Queries

Pre-compute expensive dashboard queries:

groups:
- name: cost_optimization
  interval: 60s
  rules:
  # Pre-compute CPU utilization vs requests
  - record: pod:cpu_usage:pct_request
    expr: |
      (rate(container_cpu_usage_seconds_total[5m])
      /
      kube_pod_container_resource_requests{resource="cpu"})
      * 100
  
  # Pre-compute memory utilization vs requests
  - record: pod:memory_usage:pct_request
    expr: |
      (container_memory_working_set_bytes
      /
      kube_pod_container_resource_requests{resource="memory"})
      * 100

Use recorded metrics in dashboards: pod:cpu_usage:pct_request instead of a complex query.

4. Set Retention Based on Needs

High-resolution (15s): 7-14 days for troubleshooting recent issues
Downsampled (5min): 30-90 days for trend analysis
Long-term (1hour): 1+ year for capacity planning, compliance

Use Thanos or Cortex for long-term storage with automatic downsampling.

5. Alert on Symptoms, Not Causes

Bad alert: "CPU > 80%" → Why does high CPU matter? It's just a number without context.

Good alert: "Latency p95 > 500ms for 5 minutes" → Clear user impact (slow experience).

High CPU might cause high latency, but alert on the latency (symptom users feel), not just CPU (the underlying cause that may or may not matter).

How Atmosly Transforms Metrics into Business Value

1. Automatic Baseline Learning

Atmosly learns normal metric patterns over 7-30 days, including daily cycles, weekly patterns, seasonal trends, traffic growth, and evolving resource usage. Alerts only when metrics deviate significantly from learned baseline, not arbitrary thresholds.

2. Cost-Performance Correlation

Shows metrics with cost impact:

"CPU p95: 0.4 cores, Request: 2 cores, Waste: 1.6 cores = $64/month per pod × 10 replicas = $640/month total waste"
"Memory leak detected: +15Mi/hour growth, projected OOMKill in 12 hours, current over-provisioning cost: $35/month"

3. Natural Language Queries

Ask in plain English instead of PromQL:

"Show me pods using more than 90% of their memory limit."
"Which deployments are wasting the most CPU?"
"What's causing the high error rate in production?"

Atmosly translates PromQL, executes it, and presents human-readable results with context.

4. Automated Optimization Recommendations

Based on metric analysis, Atmosly recommends:

Resource right-sizing with exact kubectl commands
Idle resource elimination schedules
Autoscaling configuration tuning
Storage cleanup automation

Each recommendation includes a cost impact assessment, a performance risk assessment, and a one-command fix.

Conclusion: From Metrics to Intelligence

Kubernetes generates overwhelming metric volume. Effective monitoring prioritizes what matters: user-facing health, resource utilization for cost optimization, control-plane metrics for cluster stability, and custom business metrics for impact assessment.

Key Takeaways:

Focus on the Four Golden Signals first (Latency, Traffic, Errors, Saturation)
Monitor actual usage vs requests/limits to identify waste
Alert on user-facing symptoms, not internal causes
Use metrics for proactive cost optimization, not just reactive troubleshooting
Implement SLOs to focus engineering effort on what matters
Avoid high-cardinality labels that explode storage
Traditional monitoring shows metrics; Atmosly shows metrics + costs + AI recommendations

Ready to transform Kubernetes metrics into actionable cost and performance intelligence? Start your free Atmosly trial and experience AI-powered metric analysis with built-in cost optimization that reduces cloud spend by average 30% while maintaining reliability.

Frequently Asked Questions

What are the most important Kubernetes metrics to monitor for production reliability?

Four Golden Signals: Core user-impact metrics to monitor service health and performance:
- Latency: p95 / p99 response time — measures how fast requests are served.
- Traffic: Requests per second — indicates load and demand patterns.
- Errors: Failure rate (%) — highlights application or infrastructure faults.
- Saturation: Resource utilization (% of capacity) — shows when systems are near overload.
Pod Health: Key metrics to track workload availability and stability:
- kube_pod_status_phase — overall pod lifecycle phase (Running, Pending, Failed, etc.).
- kube_pod_container_status_restarts_total — detects crash loops and instability.
- kube_pod_container_status_ready — readiness of containers to serve traffic.
Node Capacity: Metrics to ensure node-level health and sufficient resource availability:
- node_memory_MemAvailable_bytes — available memory on nodes.
- node_cpu_seconds_total — CPU usage over time for saturation tracking.
- node_filesystem_avail_bytes — available disk space to prevent exhaustion.
Control Plane: Metrics to monitor cluster stability and API responsiveness:
- apiserver_request_duration_seconds — API server latency.
- etcd_disk_wal_fsync_duration_seconds — etcd disk latency (affects cluster consistency).
- scheduler_schedule_attempts_total — tracks scheduling performance and potential failures.
Resource Efficiency: Metrics for identifying over-provisioned workloads and wasted resources:
- container_memory_working_set_bytes vs kube_pod_container_resource_requests — compare actual vs requested memory to detect waste.

Focus: Prioritize metrics that directly indicate user impact (Four Golden Signals) and capacity constraints for effective observability.

Atmosly Advantage: Atmosly automates Kubernetes monitoring with AI-driven anomaly detection, ensuring proactive insights before incidents impact users.

What are the Four Golden Signals for Kubernetes monitoring?

Latency: Measures how fast requests are served, typically tracked using percentiles such as p50, p95, and p99.
🔹 Example Alert: Trigger an alert if p95 latency exceeds the SLO threshold (e.g., 500 ms).
Traffic: Represents demand measured in requests per second (RPS). It helps identify usage spikes or outages.
🔹 Example Alert: Alert if traffic drops more than 50% from baseline, indicating a possible outage.
Errors: Tracks failure rate as a percentage, often focusing on 5xx responses or failed requests.
🔹 Example Alert: Alert if the error rate is sustained above 1% or exceeds the defined error budget.
Saturation: Monitors how “full” system resources are — CPU, memory, and disk usage as a percentage of capacity.
🔹 Example Alert: Alert if CPU > 80% or memory > 90% sustained over time.

Note: These Four Golden Signals from Google SRE focus on user-facing impact and business health — not just infrastructure state. Implement these first before adding hundreds of other metrics.

How does Atmosly's Cost Intelligence use Kubernetes metrics?

Resource waste with cost: Identifies over-provisioning such as “Pod uses 0.3 CPU but requests 2 CPU = 85% waste = $120/month per pod.”
💡 Recommendation: Reduce CPU requests to 400m to eliminate waste.
Idle resources: Detects unused workloads like “Staging runs 24/7 but receives zero traffic outside business hours = $450/month waste.”
💡 Recommendation: Schedule automatic shutdowns during off-hours.
Right-sizing: Analyzes 7–30 day resource usage and recommends optimal requests and limits based on cost-performance tradeoffs.
💡 Outcome: Balances reliability with cost efficiency.
Memory leak detection with cost: Detects trends such as “Memory increasing +20Mi/hour = leak costing $40/month in over-provisioning.”
💡 Action: Identify and fix memory leaks to reduce long-term cost.
Cost per namespace/deployment/pod: Provides chargeback and cost visibility across environments for accurate team-level accountability.

Note: Traditional tools only show raw Kubernetes metrics. Atmosly correlates metrics with actual cloud billing data and provides AI-driven cost optimization recommendations — helping teams achieve up to 30% savings.

What is the difference between Kubernetes resource requests and limits?

Requests = Guaranteed minimum resources: Scheduler uses requests to decide node placement — a pod is only scheduled if the node has enough available capacity. The container always receives at least the requested resources, which also determines its QoS (Quality of Service) class.
Limits = Maximum resources container can use: If CPU usage exceeds the limit, it is throttled, causing possible performance degradation. If memory usage exceeds the limit, the container is OOMKilled (terminated).
Best Practice: Set both requests and limits. - Base requests on average usage. - Set limits 20–30% higher to handle traffic spikes.
Monitor usage vs requests: If actual usage is consistently below 30% of requested resources, your workloads are over-provisioned — wasting money.
Atmosly Optimization: Atmosly automatically identifies over-provisioned pods and quantifies the cost impact to help you right-size resources efficiently.

What Kubernetes metrics should trigger alerts versus dashboard-only visibility?

Error rate > 1% sustained: Indicates users are impacted immediately — requires urgent investigation and remediation.
Latency p99 exceeds SLO: Signals poor user experience and possible service degradation.
Memory usage > 90% of limit: Imminent OOMKill risk — requires right-sizing or memory optimization.
Disk usage > 85%: Approaching exhaustion; cleanup or expansion needed to prevent downtime.
CrashLoopBackOff: A container or service is continuously crashing, causing degraded functionality.
API server latency p95 > 1s: Indicates control plane overload affecting cluster responsiveness.
etcd fsync > 100ms p99 or frequent leader changes: Represents potential cluster stability risks.
Deployment replicas unavailable > 0 for over 5 minutes: Suggests degraded redundancy or availability issues.

Dashboard-only metrics (no alerts):

CPU 50–70%: Normal operating range — no action required.
Single pod restart: Kubernetes self-healing, not an alert condition.
Memory cache usage: Typically reclaimable, not critical.
Traffic pattern variations: Normal if within baseline range.

Key Principle: Every alert must require human action — otherwise, it should not exist.

Atmosly Advantage: Atmosly leverages machine learning to learn real-time baselines and only triggers alerts on true anomalies with user impact, reducing alert noise by up to 80%.

What are Prometheus recording rules and when to use them?

Purpose: Recording rules pre-compute expensive Prometheus queries periodically and store the results as new metrics, enabling faster dashboards and alerts.
Use recording rules when:
- Query is used frequently (>10 times/hour).
- Query is computationally expensive (>1 second to compute).
- Dashboards timeout due to complex queries.
- Alerts require sub-second evaluation speed.

Example: Pre-compute p99 latency every 30 seconds instead of recalculating it on every dashboard load:

record: job:http_duration:p99
expr: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (job, le))

Benefits:
- 10–100x faster queries.
- Reduced Prometheus load and resource consumption.
- Consistent calculations across dashboards and alerts.
Trade-off: Increases storage usage due to storing pre-computed metrics.
Best Practice: Use recording rules only for high-frequency or high-cost queries — not for queries run once per day.