Kubernetes Security

Kubernetes Security Checklist: 50 Best Practices (2025) Part I

Complete Kubernetes security checklist with 30 production-ready best practices for 2025. Covers RBAC, Pod Security Standards, Network Policies, secrets management, cluster hardening, and compliance. Based on CIS Kubernetes Benchmark, NSA/CISA hardening guides, and real-world security incidents.

Introduction to Kubernetes Security: Why a Comprehensive Approach Matters

Kubernetes security is inherently complex and multi-layered, spanning access control and authentication, network isolation and traffic policies, secrets and sensitive data management, container image security and vulnerability scanning, runtime security and threat detection, cluster hardening and infrastructure protection, and compliance with regulatory frameworks like SOC 2, HIPAA, PCI-DSS, and GDPR. A single misconfiguration or oversight in any of these layers can expose your entire infrastructure to severe security incidents including unauthorized data access where attackers read sensitive customer information or proprietary business data, privilege escalation where attackers gain cluster-admin rights and take complete control of the cluster, lateral movement where compromised containers access other pods, databases, or cloud resources they shouldn't reach, data exfiltration where attackers steal databases or secrets and transmit them externally, cryptomining where attackers deploy resource-intensive cryptocurrency miners consuming your cloud budget, ransomware attacks encrypting persistent volumes and demanding payment, supply chain attacks through compromised base images or malicious dependencies, and compliance violations resulting in regulatory fines, failed audits, loss of certifications, and reputational damage that drives customers away.

The challenge is that Kubernetes security isn't a single toggle or configuration it requires implementing dozens of interconnected controls across multiple dimensions. You can have perfect RBAC preventing unauthorized human access, but if you allow privileged containers, attackers can escape containers and compromise nodes. You can lock down pod security, but if network policies aren't implemented, compromised pods can scan and attack other services across your cluster. You can secure the cluster itself, but if you pull untrusted container images without vulnerability scanning, you deploy pre-compromised applications directly into production.

This comprehensive Kubernetes security checklist provides 50 actionable, production-tested security practices organized into logical categories, based on official security frameworks including the CIS Kubernetes Benchmark (industry-standard security configuration guide), NSA and CISA Kubernetes Hardening Guide (government security guidance), Kubernetes Security documentation and CVE analysis, and real-world production security incidents and lessons learned from breaches. Each practice includes clear implementation instructions, explains why it matters with real attack scenarios, provides ready-to-use YAML configurations or kubectl commands, and indicates compliance frameworks it addresses (CIS, SOC 2, HIPAA, PCI-DSS). We'll also demonstrate how Atmosly automates the implementation of many of these security controls, continuously monitors for violations, and provides compliance reporting showing which practices are implemented versus missing across your clusters.

By methodically implementing these 50 practices, you'll establish defense-in-depth security posture for your Kubernetes infrastructure, dramatically reduce attack surface and risk exposure, meet regulatory compliance requirements with auditable controls, prevent the most common Kubernetes security incidents that plague improperly secured clusters, and gain security confidence enabling your organization to move faster with Kubernetes while maintaining strong security rather than treating security and velocity as opposing forces.

Category 1: Access Control & Authentication (Practices 1-10)

Practice 1: Enable RBAC Authorization

Implementation: Ensure Kubernetes API server runs with --authorization-mode=RBAC (default in Kubernetes 1.6+)

Verification:

# Check RBAC is enabled
kubectl api-versions | grep rbac
# Should show:
# rbac.authorization.k8s.io/v1

Why it matters: RBAC is the foundation of Kubernetes security. Without it, you have no access control anyone who can authenticate can do anything. This is the most critical security control.

Compliance: Required by CIS 3.1.1, SOC 2, HIPAA, PCI-DSS

Practice 2: Disable Anonymous Authentication

Implementation: Set --anonymous-auth=false on API server

Why it matters: Anonymous auth allows unauthenticated requests to API server with default permissions. This creates unnecessary attack surface. All requests should require authentication.

Verification:

# Try anonymous request (should fail)
curl -k https://CLUSTER_IP:6443/api/v1/namespaces
# Should return 401 Unauthorized if anonymous auth disabled

Compliance: CIS 1.2.1, NSA/CISA Hardening Guide

Practice 3: Implement Strong Authentication (OIDC/LDAP)

Implementation: Configure OIDC or LDAP instead of static client certificates

# API server OIDC flags
--oidc-issuer-url=https://accounts.google.com
--oidc-client-id=kubernetes
--oidc-username-claim=email
--oidc-groups-claim=groups

Why it matters: Client certificates don't expire, can't be revoked easily, and lack MFA support. OIDC enables SSO, MFA, automatic expiration, centralized revocation, and group-based RBAC.

Compliance: SOC 2 (CC6.1), HIPAA (164.312)

Practice 4: Never Grant cluster-admin to Regular Users

Implementation: Audit and remove unnecessary cluster-admin bindings

# List all cluster-admin bindings
kubectl get clusterrolebindings -o json | \\
  jq '.items[] | select(.roleRef.name=="cluster-admin") | 
      {name: .metadata.name, subjects: .subjects}'
# Remove binding for regular user
kubectl delete clusterrolebinding alice-admin

Why it matters: cluster-admin grants unlimited cluster access can read all secrets, delete all resources, modify RBAC, access nodes. Principle of least privilege: grant specific permissions needed, not god-mode access.

Compliance: CIS 5.1.1, SOC 2, all frameworks

Practice 5: Enable Comprehensive Audit Logging

Implementation: Configure API server audit policy

--audit-log-path=/var/log/kubernetes/audit.log
--audit-log-maxage=30
--audit-log-maxbackup=10
--audit-log-maxsize=100
--audit-policy-file=/etc/kubernetes/audit-policy.yaml

Audit policy example:

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log secret access at Metadata level (who accessed, not content)
- level: Metadata
  resources:
  - group: ""
    resources: ["secrets"]
# Log RBAC changes at Request level (full details)
- level: RequestResponse
  resources:
  - group: "rbac.authorization.k8s.io"
    resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
# Log pod exec/attach at Metadata level
- level: Metadata
  resources:
  - group: ""
    resources: ["pods/exec", "pods/attach"]

Why it matters: Audit logs provide forensics for security incidents, compliance evidence for auditors, debugging trail for "who changed what when," and detect unauthorized access attempts. Without auditing, you're blind to security events.

Compliance: SOC 2 (CC7.2), HIPAA (164.312(b)), PCI-DSS (10.2)

Practice 6: Disable ServiceAccount Token Auto-Mount

Implementation: Set automountServiceAccountToken: false by default

# Namespace-wide default
apiVersion: v1
kind: ServiceAccount
metadata:
  name: default
  namespace: production
automountServiceAccountToken: false
# Or per-pod
spec:
  automountServiceAccountToken: false

Why it matters: By default, every pod gets ServiceAccount token mounted at /var/run/secrets/kubernetes.io/serviceaccount/token. If pod compromised, attacker gets token for Kubernetes API access. Only mount when actually needed.

Compliance: CIS 5.1.5

Practice 7: Create Dedicated ServiceAccounts per Application

Implementation: Never use default ServiceAccount

# Create app-specific ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: payment-service-sa
  namespace: production
---
# Use in deployment
spec:
  serviceAccountName: payment-service-sa

Why it matters: Default ServiceAccount is shared by all pods in namespace. Compromising one pod compromises all if they share ServiceAccount. Separate ServiceAccounts enable least-privilege RBAC per application.

Compliance: CIS 5.1.5, SOC 2

Practice 8: Implement Multi-Factor Authentication for Admin Access

Implementation: Require MFA for kubectl access via OIDC provider

Why it matters: Passwords alone are weak (phishing, credential stuffing, password reuse). MFA (something you know + something you have) dramatically reduces account takeover risk. Critical for privileged access.

Compliance: SOC 2 (CC6.1), PCI-DSS (8.3)

Practice 9: Rotate Credentials Regularly

Implementation: Automate credential rotation

  • ServiceAccount tokens: Every 90 days
  • TLS certificates: Before expiration (typically 1 year)
  • Database passwords in Secrets: Every 90 days
  • API keys: Every 180 days

Why it matters: Credential rotation limits exposure window if credentials compromised. Stolen credentials become useless after rotation.

Compliance: SOC 2, PCI-DSS (8.2.4)

Practice 10: Implement Session Timeout and Token Expiration

Implementation: Configure short-lived tokens via OIDC

--oidc-username-claim=email
--oidc-groups-claim=groups
# Tokens expire based on OIDC provider config (15-60 minutes typical)

Why it matters: Long-lived tokens remain valid even after user leaves company or role changes. Short expiration forces re-authentication, ensuring access reflects current authorization.

Category 2: Pod Security Standards (Practices 11-20)

Practice 11: Enforce Pod Security Standards at Namespace Level

Implementation: Label namespaces with Pod Security Standard level

# Production: Restricted (most secure)
kubectl label namespace production \\
  pod-security.kubernetes.io/enforce=restricted \\
  pod-security.kubernetes.io/audit=restricted \\
  pod-security.kubernetes.io/warn=restricted
# Staging: Baseline
kubectl label namespace staging \\
  pod-security.kubernetes.io/enforce=baseline
# Dev: Privileged (unrestricted)
kubectl label namespace dev \\
  pod-security.kubernetes.io/enforce=privileged

Why it matters: Pod Security Standards replaced deprecated PodSecurityPolicies in Kubernetes 1.25+. They enforce security baselines preventing privileged containers, host namespace sharing, and other dangerous configurations.

Compliance: CIS 5.2.x, NSA/CISA Hardening

Practice 12: Never Run Containers as Root

Implementation: Set runAsNonRoot: true and runAsUser: 1000

securityContext:
  runAsNonRoot: true  # Enforces non-root
  runAsUser: 1000     # Specific UID
  runAsGroup: 3000
  fsGroup: 2000       # For volume permissions

Why it matters: Running as root (UID 0) inside containers increases risk of container escape attacks. If attacker escapes container, they have root on host. Non-root limits damage.

Compliance: CIS 5.2.6, Required by PSS "restricted" level

Practice 13: Drop All Linux Capabilities

Implementation: Drop ALL capabilities, add only specific ones needed

securityContext:
  capabilities:
    drop:
    - ALL                # Drop everything
    add:
    - NET_BIND_SERVICE  # Only if need to bind to port < 1024

Why it matters: Linux capabilities grant partial root privileges (CAP_SYS_ADMIN, CAP_NET_ADMIN, etc.). Dropping ALL removes dangerous capabilities attackers could exploit for privilege escalation.

Compliance: CIS 5.2.9, PSS "restricted"

Practice 14: Use Read-Only Root Filesystem

Implementation:

securityContext:
  readOnlyRootFilesystem: true
# Use emptyDir for writable paths
volumes:
- name: tmp
  emptyDir: {}
- name: cache
  emptyDir: {}
volumeMounts:
- name: tmp
  mountPath: /tmp
- name: cache
  mountPath: /var/cache

Why it matters: Read-only filesystem prevents attackers from writing malware, modifying binaries, or persisting backdoors. Forces immutable infrastructure pattern.

Compliance: PSS "restricted" recommended

Practice 15: Disable Privilege Escalation

Implementation:

securityContext:
  allowPrivilegeEscalation: false

Why it matters: Prevents setuid/setgid binaries from gaining elevated privileges. Blocks common container escape techniques relying on privilege escalation.

Compliance: CIS 5.2.5, PSS "restricted" required

Practice 16: Set Resource Limits on All Containers

Implementation:

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

Why it matters: Prevents resource exhaustion attacks (cryptominers consuming all CPU, memory bombs crashing nodes). Limits blast radius of compromised containers. Also prevents noisy neighbor problems.

Compliance: CIS 5.2.1, CIS 5.2.2

Practice 17: Implement Liveness and Readiness Probes

Implementation:

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

Why it matters: Detects and automatically restarts compromised or malfunctioning containers. Prevents zombie processes from serving traffic.

Compliance: High availability best practice

Practice 18: Scan Container Images for Vulnerabilities

Implementation: Integrate Trivy, Snyk, or Aqua in CI/CD

# Trivy image scan in pipeline
trivy image --severity HIGH,CRITICAL my-app:v1.2.3
# Fail build if HIGH or CRITICAL vulnerabilities found

Why it matters: Container images often contain known vulnerabilities (CVEs). Scanning prevents deploying vulnerable software. 80% of breaches exploit known vulnerabilities with available patches.

Compliance: CIS 5.4.1, SOC 2

Practice 19: Use Minimal Base Images (Distroless or Alpine)

Implementation: Use Google Distroless or Alpine Linux base images

# Instead of:
FROM ubuntu:22.04  # 77MB, many packages, large attack surface
# Use:
FROM gcr.io/distroless/static-debian11  # 2MB, no shell, minimal attack surface
# Or
FROM alpine:3.19  # 7MB, minimal packages

Why it matters: Fewer packages = fewer vulnerabilities = smaller attack surface. Distroless images have no shell, package managers, or unnecessary binaries attackers could use.

Compliance: CIS 5.4.3

Practice 20: Implement Image Signing and Verification

Implementation: Use Sigstore/Cosign or Docker Content Trust

# Sign image with Cosign
cosign sign --key cosign.key my-registry.com/my-app:v1.2.3
# Verify signature in admission controller
# (prevents running unsigned or tampered images)

Why it matters: Prevents supply chain attacks where attackers replace legitimate images with malicious versions. Signing ensures image integrity and authenticity.

Compliance: SLSA Level 3, Supply Chain Security

Category 3: Network Security (Practices 21-30)

Practice 21: Implement Default Deny Network Policies

Implementation: Create deny-all as baseline

# Deny all ingress traffic by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-ingress
  namespace: production
spec:
  podSelector: {}  # Applies to ALL pods in namespace
  policyTypes:
  - Ingress
---
# Deny all egress traffic by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-egress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Egress

Why it matters: Default deny = zero-trust networking. Nothing can communicate unless explicitly allowed. Limits lateral movement if container compromised.

Compliance: CIS 5.3.2, Zero Trust Architecture

Practice 22: Explicitly Allow Only Required Traffic

Implementation: Create allow policies for legitimate communication

# Allow frontend -> backend on port 8080 only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
spec:
  podSelector:
    matchLabels:
      app: backend
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080  # Only this port

Why it matters: Whitelist-only approach prevents unexpected connections. If backend database compromised, it can't initiate outbound connections to exfiltrate data (egress blocked).

Practice 23: Always Allow DNS (Critical)

Implementation: Allow egress to kube-dns/CoreDNS

egress:
- to:
  - namespaceSelector:
      matchLabels:
        name: kube-system  # DNS pods in kube-system
  ports:
  - protocol: UDP
    port: 53  # DNS

Why it matters: Everything breaks without DNS. This is the #1 Network Policy mistake—blocking DNS and wondering why nothing works.

Practice 24: Enforce mTLS Between Services with a Service Mesh

Implementation: Enable strict mTLS in your service mesh (e.g., Istio, Linkerd) so all pod-to-pod traffic is encrypted and authenticated.

Example (Istio):

apiVersion: security.istio.io/v1beta1 kind: PeerAuthentication metadata:
              name: default
              namespace: production spec:
              mtls:
                mode: STRICT  # Require mTLS for all workloads in this namespace
            --- apiVersion: security.istio.io/v1beta1 kind: DestinationRule metadata:
              name: backend-mtls
              namespace: production spec:
              host: backend.production.svc.cluster.local
              trafficPolicy:
                tls:
                  mode: ISTIO_MUTUAL 

Why it matters: mTLS encrypts traffic in transit and verifies the identity of both client and server using certificates. This prevents eavesdropping, spoofing, and man-in-the-middle attacks between microservices and enforces strong, workload-level identity.

Compliance: Zero Trust Architecture, SOC 2 (CC6.x), HIPAA (164.312(e)), PCI-DSS 4.0 (Data in Transit Encryption)

Practice 25: Enforce TLS for Ingress – HTTPS Everywhere

Implementation: Terminate HTTPS at the Ingress (or API gateway) using valid certificates (Let’s Encrypt via cert-manager, or enterprise PKI). Redirect all HTTP to HTTPS.

Example (Nginx Ingress + cert-manager):

apiVersion: networking.k8s.io/v1 kind: Ingress metadata:
              name: prod-app
              namespace: production
              annotations:
                cert-manager.io/cluster-issuer: letsencrypt-prod
                nginx.ingress.kubernetes.io/force-ssl-redirect: "true" spec:
              ingressClassName: nginx
              tls:
              - hosts:
                - app.example.com
                secretName: prod-app-tls
              rules:
              - host: app.example.com
                http:
                  paths:
                  - path: /
                    pathType: Prefix
                    backend:
                      service:
                        name: app-service
                        port:
                          number: 80 

Why it matters: Without TLS, credentials, tokens, and data travel in plaintext and can be intercepted or modified. Enforcing HTTPS for all external traffic is a fundamental security control and often a regulatory requirement.

Compliance: PCI-DSS (4.x – Strong Cryptography), HIPAA (164.312(e)), SOC 2 (CC6.1)

Practice 26: Isolate and Protect the Kubernetes Control Plane

Implementation: Restrict access to the Kubernetes API server to trusted networks only (VPN, bastion hosts, office IPs). Avoid exposing the API directly to the public internet.

Examples:

Use private API endpoints for managed clusters (EKS/GKE/AKS) where possible.

Restrict public access with firewall rules / security groups / master authorized networks.

Force admins to access the API via VPN or bastion.

Example: allow API access only from VPN CIDR (conceptual)
security group / firewall rule:
ALLOW tcp 443 FROM 10.0.0.0/16 (VPN)
DENY tcp 443 FROM 0.0.0.0/0 (Internet)

Why it matters: The Kubernetes API is the brain of the cluster. If an attacker reaches it and finds misconfigurations or weak credentials, they can compromise everything. Network isolation dramatically reduces the attack surface and brute-force attempts.

Compliance: CIS Kubernetes Benchmark (API server controls), SOC 2 (CC6.x), Zero Trust Network principles

Practice 27: Enable CNI Network Flow Logging for Forensics

Implementation: Turn on network flow logging in your CNI plugin (e.g., Calico Flow Logs, Cilium Hubble) and send logs to your central logging/SIEM system.

Example (Calico – FelixConfiguration):

apiVersion: projectcalico.org/v3 kind: FelixConfiguration metadata:
              name: default spec:
              flowLogsFlushInterval: 10s
              flowLogsFileEnabled: true
              flowLogsFileDirectory: /var/log/calico/flows 

Ship /var/log/calico/flows to Loki/ELK/SIEM with an agent (Fluent Bit, Vector, etc.).

Why it matters: Flow logs show who talked to whom, when, and on which ports. They are invaluable for incident investigations, detecting suspicious lateral movement, and validating that NetworkPolicies work as intended.

Compliance: SOC 2 (CC7.x – logging & monitoring), PCI-DSS (10.x – log all network access to cardholder data)

Practice 28: Use Private Container Registries and Restrict Image Sources

Implementation: Store images in private registries (ECR, GCR, ACR, Harbor) and restrict nodes to pull only from approved registries.
Create an imagePullSecret:

apiVersion: v1
kind: Secret
metadata:
name: regcred
namespace: production
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: <base64-encoded-dockerconfigjson>

Attach it to your ServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
name: app-sa
namespace: production
imagePullSecrets:
name: regcred

At the network layer, allow egress from nodes only to your approved registries (via firewall rules, VPC endpoints, or NetworkPolicies).

Why it matters: Pulling images from random public registries is a major supply chain risk. Private, controlled registries plus restricted egress prevent untrusted or malicious images from entering your environment.

Compliance: Supply Chain Security (SLSA), SOC 2 (CC6.x), PCI-DSS 4.0 (Software Integrity)

Practice 29: Implement Egress Filtering and Outbound Firewalls

Implementation: Use NetworkPolicies, egress gateways, and/or cloud firewalls to explicitly control outbound traffic from pods and nodes.

Example (restrict egress to a specific external service):

apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata:
              name: allow-egress-to-payments
              namespace: production spec:
              podSelector:
                matchLabels:
                  app: backend
              policyTypes:
              - Egress
              egress:
              - to:
                - ipBlock:
                    cidr: 203.0.113.10/32  # payments-api.example.com
                ports:
                - protocol: TCP
                  port: 443 

Combine with cloud firewall rules (NACLs / Security Groups) so that default is deny and only specific destinations (databases, payment gateways, APIs, logging endpoints) are allowed.

Why it matters: Without egress controls, a compromised pod can exfiltrate data to any IP on the internet or connect to attacker C2 servers. Egress filtering limits damage and is essential for Zero Trust.

Compliance: Zero Trust Architecture, PCI-DSS (restrict outbound connections), SOC 2 (CC6.x)

Practice 30: Monitor Network Traffic with IDS/NDR

Implementation: Deploy network detection and response (NDR) or IDS tooling that monitors Kubernetes traffic patterns for anomalies:

Use eBPF-based tools (e.g., Cilium Hubble, Tetragon) to observe connections.

Mirror traffic to IDS (Zeek/Suricata) using cloud traffic mirroring / SPAN.

Send alerts to your SIEM when suspicious patterns appear (port scans, unusual outbound connections, spikes in failed connections).

Conceptual example (Hubble UI deployment):

 
# Assumes Cilium is installed with Hubble enabled
            cilium hubble enable
            cilium hubble ui
            

Why it matters: Even with strong NetworkPolicies, you need visibility into what is actually happening on the wire. Network monitoring detects stealthy attacks, lateral movement attempts, and policy misconfigurations that logs alone might miss.

Compliance: SOC 2 (CC7.2 – detect and respond to security events), PCI-DSS (10.x & 11.x – monitoring and intrusion detection)

How Atmosly Automates Kubernetes Security

Pre-Configured Security Controls

Atmosly implements many of these 50 practices automatically:

  • RBAC Automation: Pre-configured roles (super_admin, read_only, devops) following least privilege, automatic ServiceAccount creation, proper subject binding
  • Pod Security Enforcement: Auto-applies Pod Security Standard labels (restricted for prod, baseline for staging)
  • Vulnerability Scanning: Scans deployed workloads, reports CVEs by severity
  • Network Policy Recommendations: Analyzes traffic patterns, suggests policies
  • Secrets Management: Integrates with Vault, AWS Secrets Manager, tracks rotation
  • Compliance Reporting: CIS Kubernetes Benchmark compliance dashboard
  • Runtime Threat Detection: AI-powered anomaly detection for suspicious pod behavior
  • Audit Logging: Centralized audit trail of all Atmosly-initiated actions

Continuous Security Monitoring

Atmosly continuously monitors for security violations:

  • Privileged containers running in production (Policy violation)
  • Pods without resource limits (DoS risk)
  • ServiceAccounts with cluster-admin (Over-privilege)
  • Secrets without encryption at rest
  • Images with HIGH/CRITICAL CVEs in production
  • Network policies missing from critical namespaces

Alerts with specific remediation guidance and kubectl commands to fix.

Implementing the Checklist: Prioritized Roadmap

Phase 1: Critical (Week 1) - Practices that prevent immediate breaches

  • Practice 1: Enable RBAC ✅
  • Practice 4: Remove cluster-admin from regular users ✅
  • Practice 11: Enforce Pod Security Standards ✅
  • Practice 12: Never run as root ✅
  • Practice 21: Default deny Network Policies ✅
  • Practice 31: Encrypt secrets at rest ✅
  • Practice 41: Update Kubernetes (patch CVEs) ✅

Phase 2: High Priority (Week 2-3) - Defense in depth

These practices strengthen cluster defenses beyond the basics and significantly reduce lateral movement, privilege escalation, and secret exposure risks.

Practice 13: Enforce strict container capabilities (drop all, add minimal) ✅

Practice 14: Enforce read-only root filesystem where possible ✅

Practice 15: Enforce seccomp profiles (RuntimeDefault or custom) ✅

Practice 16: Configure AppArmor/SELinux policies for workloads ✅

Practice 17: Use non-root, dedicated service accounts per workload ✅

Practice 22: Implement egress restrictions with Network Policies ✅

Practice 23: Segment namespaces by environment (dev/stage/prod isolation) ✅

Practice 24: Enforce ingress controls & restrict load balancer exposure ✅

Practice 25: Use cluster-wide DNS & identity-based access controls ✅

Practice 32: Adopt an external secrets manager (Vault, AWS Secrets Manager, etc.) ✅

Practice 33: Keep all secrets out of Git (Git hygiene + scanners) ✅

Practice 34: Automate secret rotation at defined intervals ✅

Practice 35: Ensure secret versioning and access traceability ✅

Practice 36: Use Sealed Secrets or SOPS for GitOps-safe encryption ✅

Practice 42: Enforce admission controls (OPA Gatekeeper / Kyverno)
Prevent privileged pods, hostPath mounts, unapproved registries. ✅

Practice 43: Disable or secure the Kubernetes Dashboard (SSO-only) ✅

Practice 44: Apply ResourceQuotas + LimitRanges for resource boundaries ✅

Practice 45: Use PodDisruptionBudgets for safe upgrades/rollouts ✅

This phase creates defense-in-depth, ensuring that even if one control fails, multiple layers still protect the cluster.

Phase 3: Medium Priority (Month 1) - Hardening and compliance

These practices finalize your production-grade security posture, ensuring long-term compliance, resilience, and operational integrity.

Practice 18: Configure image pull policies and restrict image tags (no latest) ✅

Practice 19: Enforce immutable images and pinned digests ✅

Practice 20: Scan container images for vulnerabilities (CI/CD + runtime) ✅

Practice 26: Set up multi-layer network segmentation for internal services ✅

Practice 27: Harden cluster ingress (TLS, WAF, mTLS when possible) ✅

Practice 28: Configure node-level firewall rules & metadata protection ✅

Practice 29: Harden API server, kubelet, and control-plane access ✅

Practice 30: Apply least-privilege IAM roles for cloud integrations ✅

Practice 37: Enable audit logging for secret access & API actions ✅

Practice 38: Separate secrets by environment and namespace ✅

Practice 39: Track secret versions and changes for compliance ✅

Practice 40: Monitor secret expiration (TLS certs, API tokens, DB credentials) ✅

Practice 46: Monitor runtime threats using Falco or eBPF sensors ✅

Practice 47: Implement backup & disaster recovery (Velero, snapshots) ✅

Practice 48: Harden worker nodes and underlying OS image ✅

Practice 49: Run regular CIS + vulnerability audits (kube-bench, Trivy, Kubescape) ✅

Practice 50: Establish continuous security & compliance monitoring ✅

By the end of Phase 3, your clusters reach a fully hardened, audit-ready, compliance-aligned state that meets SOC 2, PCI-DSS, ISO 27001, and CIS Kubernetes Benchmark requirements.

Conclusion: Building Defense-in-Depth Security

Kubernetes security requires implementing controls across multiple layers—no single practice protects completely. These 50 best practices provide comprehensive defense-in-depth security posture.

Critical Priorities:

  1. Enable RBAC with least privilege
  2. Enforce Pod Security Standards (restricted for production)
  3. Never run containers as root
  4. Implement Network Policies (default deny)
  5. Encrypt secrets, use external secrets manager
  6. Scan images for vulnerabilities
  7. Keep Kubernetes patched and updated
  8. Enable comprehensive audit logging

Implementing all 50 practices manually is time-consuming. Atmosly automates enforcement of these practices, continuously monitors for violations, and provides compliance dashboards showing your security posture against CIS benchmarks.

Ready to implement production-grade Kubernetes security without manual overhead? Start your free Atmosly trial for automated security with built-in best practices and continuous compliance monitoring.

Frequently Asked Questions

What are the most critical Kubernetes security best practices for production?
  1. Enable RBAC with least privilege: Never grant cluster-admin to regular users.
  2. Enforce Pod Security Standards: Apply the restricted level for production namespaces.
  3. Never run containers as root: Set runAsNonRoot: true and runAsUser: 1000.
  4. Drop ALL Linux capabilities: Add only the specific capabilities your workloads require.
  5. Implement Network Policies: Use a default deny-all policy and explicitly allow only required traffic.
  6. Encrypt secrets at rest in etcd: Use external secret managers like Vault or AWS Secrets Manager.
  7. Scan container images for vulnerabilities: Integrate scanning into CI/CD pipelines and block HIGH/CRITICAL CVEs.
  8. Use minimal base images: Choose lightweight images such as distroless or Alpine to reduce the attack surface.
  9. Keep Kubernetes updated: Apply security patches within 30 days of release.
  10. Enable comprehensive API server audit logging: Capture and retain audit logs for forensics and compliance.

Note: These 10 practices provide a strong security foundation. Implement these first before the remaining 40 for complete defense-in-depth.

What is the CIS Kubernetes Benchmark and how do I check compliance?
  1. Definition: The CIS Kubernetes Benchmark is a comprehensive security configuration guide defining best practices for hardening Kubernetes clusters. It provides detailed recommendations covering all major components of the Kubernetes ecosystem.
  2. Coverage Areas:
    • Control Plane: API server, etcd, scheduler, controller-manager
    • Worker node configuration and OS hardening
    • RBAC policies and least-privilege access
    • Pod Security (PSS / PSP legacy)
    • Network Policies
    • Secrets management and encryption
    • Operational best practices (logging, auditing, rotation, etc.)
  3. Importance:

    The CIS Benchmark is an industry-standard security baseline. Many compliance frameworks such as PCI-DSS, HIPAA, and SOC 2 require or reference CIS alignment as part of cluster security posture evaluation.

  4. How to Check Compliance (kube-bench):

    Use the open-source kube-bench tool to scan and score clusters against CIS recommendations.

    kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
    kubectl logs job/kube-bench
    

    kube-bench identifies failures such as:

    • Anonymous auth not disabled
    • Missing API server audit logging
    • Overly permissive RBAC configurations
    • Insecure pod or node configurations
  5. Fixing Issues:

    Remediate failures systematically, prioritizing critical and high-risk findings first (e.g., ETCD encryption, API server flag settings, RBAC tightening). Re-run kube-bench after fixes to validate improvements.

  6. Ongoing Compliance:

    Perform CIS compliance scans monthly or after major cluster upgrades to ensure hardening remains intact.

  7. Atmosly Integration:

    Atmosly provides a centralized CIS compliance dashboard showing pass/fail scores across all clusters, with prioritized remediation guidance for each failed benchmark section.

How do I implement Pod Security Standards in production Kubernetes?
  1. Label production namespace with restricted (enforce, audit, warn):
    kubectl label namespace production \
      pod-security.kubernetes.io/enforce=restricted \
      pod-security.kubernetes.io/audit=restricted \
      pod-security.kubernetes.io/warn=restricted
  2. Restricted level - required settings (production standard):
    • runAsNonRoot: true (cannot run as UID 0) and set a non-zero UID with runAsUser.
    • capabilities.drop: ["ALL"] — explicitly drop all Linux capabilities.
    • allowPrivilegeEscalation: false in every container.
    • seccompProfile.type: RuntimeDefault (or Localhost for custom profiles).
    • Restrict volumes to safe types (configMap, downwardAPI, emptyDir, persistentVolumeClaim, projected, secret) — no hostPath.
    • No hostNetwork, hostPID, or hostIPC; no privileged: true containers.
    • Recommended: readOnlyRootFilesystem: true with emptyDir for writable paths.
  3. Update pod specs to comply (example securityContext):
    securityContext:
      runAsNonRoot: true
      runAsUser: 1000
      allowPrivilegeEscalation: false
      capabilities:
        drop: ["ALL"]
      seccompProfile:
        type: RuntimeDefault
      readOnlyRootFilesystem: true
    # if writable /tmp required:
    volumes:
    - name: tmp
      emptyDir: {}
    volumeMounts:
    - name: tmp
      mountPath: /tmp
  4. Test in staging first (warn & audit modes):

    Apply PSS in warn then audit mode to collect violations without blocking:

    kubectl label namespace staging pod-security.kubernetes.io/warn=restricted
    kubectl label namespace staging pod-security.kubernetes.io/audit=restricted

    Fix violations shown in warnings/audit logs before enforcing.

  5. Use baseline for staging/migration:

    Baseline prevents common privilege escalations but allows running as root—use when migrating legacy apps that cannot immediately meet Restricted constraints.

    kubectl label namespace staging pod-security.kubernetes.io/enforce=baseline
  6. System namespaces require privileged level:

    Only use privileged for trusted infrastructure/system namespaces (e.g., kube-system) that need host access for CNI/CSI components.

  7. Migration pattern from PSP → PSS (safe rollout):
    1. Audit current PSPs and map them to PSS levels.
    2. Apply PSS in warn mode per-namespace to surface violations.
    3. Fix pod specs (securityContext, volumes, probes, etc.).
    4. Switch namespaces to audit to log for compliance tracking.
    5. When no violations remain, enable enforce for the namespace.
    6. Only remove PSPs after PSS enforcement is active everywhere.
  8. Enforce restricted finally (example):
    kubectl label namespace production \
      pod-security.kubernetes.io/enforce=restricted --overwrite
  9. Verify & validate:
    • Deploy test workloads to confirm enforcement rejects non-compliant pods.
    • Monitor API audit logs for violations (audit mode) and iterate fixes.
  10. Operational recommendation:

    Adopt Restricted for all production application workloads, Baseline for staging/migration, and Privileged only for system namespaces.

    Atmosly can auto-apply labels per environment, validate pod specs in CI, suggest exact YAML fixes, and provide dashboards for enforcement and compliance progress.

What should I never do in Kubernetes from security perspective?
  1. Never grant cluster-admin to regular users: gives unlimited cluster access; use least-privilege roles instead.
  2. Never run containers as root (UID 0) without strong justification: running as root increases risk of container escape and host compromise; use runAsNonRoot: true and non-zero runAsUser.
  3. Never disable RBAC or enable --anonymous-auth=true: removes access controls and auditing—leave RBAC enabled and enforce least privilege.
  4. Never commit secrets to Git (even base64-encoded): base64 is only encoding, not encryption—use Sealed Secrets or external secret managers (Vault, AWS Secrets Manager).
  5. Never use the latest image tag in production: breaks immutability and reproducibility; use immutable tags (commit SHA / semantic version).
  6. Never skip image vulnerability scanning: shipping images with known CVEs invites compromise—scan images in CI and block HIGH/CRITICAL issues.
  7. Never allow privileged: true containers without extreme justification: privileged mode grants near-host-level access—limit to trusted system components only.
  8. Never use hostNetwork, hostPID, or hostIPC unnecessarily: these break namespace isolation and increase blast radius if compromised.
  9. Never expose the Kubernetes Dashboard publicly without strong auth: unauthenticated or poorly protected dashboards are a common attack vector.
  10. Never ignore Kubernetes security updates: apply control-plane and node patches promptly (target SLA, e.g., within 30 days) to avoid known exploits.

Each of these anti-patterns violates least-privilege or defense-in-depth principles and is responsible for the majority of Kubernetes security incidents—avoid them to keep clusters secure.

How does Atmosly automate Kubernetes security implementation?
  1. RBAC Automation:
    • Pre-configured least-privilege roles such as super_admin, read_only, and devops.
    • Automatic ServiceAccount creation and correct Role/ClusterRole binding.
    • Prevents common RBAC misconfigurations like empty subjects or unintended privilege escalation.
  2. Pod Security:
    • Automatically applies Pod Security Standard (PSS) labels:
      • restricted for production
      • baseline for staging
      • privileged for system namespaces
    • Validates deployments in CI/CD before they reach the cluster.
    • Catches and explains PSS violations early with precise fix suggestions.
  3. Vulnerability Scanning:
    • Scans deployed container images for CVEs across severity levels (CRITICAL, HIGH, MEDIUM, LOW).
    • Alerts on HIGH/CRITICAL vulnerabilities in production workloads.
    • Provides remediation links and recommended patch versions.
  4. Network Policies:
    • Analyzes live pod-to-pod traffic to auto-generate least-privilege Network Policies.
    • Recommends YAML allowing only observed traffic—data-driven, not guesswork.
    • Validates policies to prevent accidental service lockouts.
  5. Secrets Management:
    • Integrates with Vault, AWS Secrets Manager, GCP Secret Manager, and Azure Key Vault.
    • Audits which pods access which secrets.
    • Tracks rotation schedules and alerts before certificates/keys expire.
    • Detects secrets accidentally committed to Git repositories.
  6. Compliance Reporting:
    • Full CIS Kubernetes Benchmark dashboard with pass/fail per control.
    • Maps Kubernetes configurations to SOC 2, HIPAA, and PCI-DSS compliance requirements.
    • Provides remediation commands (e.g., kubectl fixes) for each failed rule.
  7. Runtime Threat Detection:
    • AI-based monitoring for suspicious container behavior.
    • Detects privilege escalation attempts, anomalous network connections, and unusual process activity.
    • Protects against runtime threats beyond static configuration checks.
  8. Overall Benefit:

    Atmosly dramatically reduces Kubernetes security configuration overhead while improving cluster security posture, compliance readiness, and operational safety.