Kubernetes Production Best Practices - A Comprehensive Guide
Kubernetes has become the de facto standard for container orchestration in modern cloud-native applications. However, running Kubernetes in production requires careful planning, implementation of best practices, and continuous monitoring. In this comprehensive guide, we'll explore the essential practices that will help you build robust, scalable, and secure Kubernetes clusters.
Disclaimer: Kubernetes®, K8s®, Docker®, and other product names mentioned in this article are trademarks of their respective owners. All logos and trademarks are used for representation purposes only. No prior copyright or trademark authorization has been obtained. This content is for educational purposes only.
1. Resource Management and Limits
One of the most critical aspects of running Kubernetes in production is proper resource management. Without it, you risk cluster instability, performance degradation, and unexpected costs.
Setting Resource Requests and Limits
Always define resource requests and limits for your containers:
apiVersion: v1
kind: Pod
metadata:
name: resource-demo
spec:
containers:
- name: app
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Why this matters:
- Requests ensure the scheduler finds appropriate nodes
- Limits prevent resource overconsumption and noisy neighbor problems
- Proper resource allocation improves cluster bin-packing efficiency
Quality of Service (QoS) Classes
Kubernetes assigns QoS classes based on resource specifications:
- Guaranteed - Requests = Limits for all containers
- Burstable - At least one container has requests < limits
- BestEffort - No requests or limits defined
In production, aim for Guaranteed or Burstable QoS for critical workloads.
2. High Availability and Scalability
Horizontal Pod Autoscaling (HPA)
Implement HPA to automatically scale based on metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app-deployment
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Pod Disruption Budgets (PDB)
Protect your applications during voluntary disruptions:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: app-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: critical-app
Multi-Zone Deployments
Distribute pods across availability zones using topology spread constraints:
apiVersion: apps/v1
kind: Deployment
metadata:
name: multi-zone-app
spec:
replicas: 6
template:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: multi-zone-app
3. Security Best Practices
Network Policies
Implement zero-trust networking with Network Policies:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-policy
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: database
ports:
- protocol: TCP
port: 5432
Pod Security Standards
Enforce security contexts and pod security standards:
apiVersion: v1
kind: Pod
metadata:
name: secure-pod
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: nginx:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
RBAC Configuration
Implement least-privilege access control:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: production
subjects:
- kind: ServiceAccount
name: app-service-account
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
4. Monitoring and Observability
Health Checks
Implement comprehensive health checks:
apiVersion: v1
kind: Pod
metadata:
name: health-check-demo
spec:
containers:
- name: app
image: myapp:latest
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
startupProbe:
httpGet:
path: /startup
port: 8080
failureThreshold: 30
periodSeconds: 10
Structured Logging
Use structured logging for better observability:
// Example in Go
log.WithFields(log.Fields{
"user_id": userID,
"action": "login",
"status": "success",
"duration_ms": duration,
}).Info("User logged in")
5. Deployment Strategies
Rolling Updates with Safety Checks
Configure safe rolling updates:
apiVersion: apps/v1
kind: Deployment
metadata:
name: safe-deployment
spec:
replicas: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
minReadySeconds: 30
progressDeadlineSeconds: 600
template:
spec:
containers:
- name: app
image: myapp:v2
Blue-Green Deployments
Use service selectors for blue-green deployments:
# Blue deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
---
# Service pointing to blue
apiVersion: v1
kind: Service
metadata:
name: app-service
spec:
selector:
app: myapp
version: blue # Switch to 'green' for cutover
ports:
- port: 80
6. Storage and Persistence
Using StatefulSets for Stateful Applications
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: database
spec:
serviceName: "db"
replicas: 3
selector:
matchLabels:
app: database
template:
spec:
containers:
- name: postgres
image: postgres:14
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "fast-ssd"
resources:
requests:
storage: 100Gi
7. Configuration Management
Using ConfigMaps and Secrets Properly
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
database.host: "db.production.svc.cluster.local"
cache.enabled: "true"
---
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
type: Opaque
stringData:
database.password: "encrypted-password"
api.key: "encrypted-api-key"
Mount as environment variables or volumes:
spec:
containers:
- name: app
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: app-secrets
8. Backup and Disaster Recovery
Regular Backup Strategy
- etcd Backups - Automate daily etcd snapshots
- Persistent Volume Backups - Use tools like Velero
- Configuration Backups - Version control all YAML files
Example Velero Backup Schedule
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: daily-backup
spec:
schedule: "0 2 * * *"
template:
includedNamespaces:
- production
- staging
snapshotVolumes: true
ttl: 720h0m0s
9. Cost Optimization
Resource Optimization Tips
- Use Cluster Autoscaler for node-level scaling
- Implement Pod Priority Classes for critical workloads
- Use Spot/Preemptible Instances for non-critical workloads
- Monitor and Right-Size resources based on actual usage
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000
globalDefault: false
description: "High priority for critical services"
10. GitOps and CI/CD
Implementing GitOps with ArgoCD
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: production-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/yourorg/k8s-manifests
targetRevision: main
path: production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Conclusion
Running Kubernetes in production is a journey that requires continuous learning and improvement. By implementing these best practices, you'll build a solid foundation for reliable, secure, and scalable applications.
Key Takeaways
✅ Always set resource requests and limits
✅ Implement comprehensive monitoring and alerting
✅ Use RBAC and network policies for security
✅ Automate scaling with HPA and cluster autoscaler
✅ Regular backups and disaster recovery testing
✅ Adopt GitOps for declarative infrastructure
✅ Continuous optimization based on metrics
Remember: Production readiness is not a destination but a continuous process of refinement and adaptation to your organization's needs.
What's your experience with Kubernetes in production? Share your challenges and solutions in the comments below!
