Skip to main content

Kubernetes Production Best Practices - A Comprehensive Guide

· 6 min read
Hariprasath Ravichandran
Senior Platform Engineer @ CData

Kubernetes has become the de facto standard for container orchestration in modern cloud-native applications. However, running Kubernetes in production requires careful planning, implementation of best practices, and continuous monitoring. In this comprehensive guide, we'll explore the essential practices that will help you build robust, scalable, and secure Kubernetes clusters.


Disclaimer: Kubernetes®, K8s®, Docker®, and other product names mentioned in this article are trademarks of their respective owners. All logos and trademarks are used for representation purposes only. No prior copyright or trademark authorization has been obtained. This content is for educational purposes only.


1. Resource Management and Limits

One of the most critical aspects of running Kubernetes in production is proper resource management. Without it, you risk cluster instability, performance degradation, and unexpected costs.

Setting Resource Requests and Limits

Always define resource requests and limits for your containers:

apiVersion: v1
kind: Pod
metadata:
name: resource-demo
spec:
containers:
- name: app
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"

Why this matters:

  • Requests ensure the scheduler finds appropriate nodes
  • Limits prevent resource overconsumption and noisy neighbor problems
  • Proper resource allocation improves cluster bin-packing efficiency

Quality of Service (QoS) Classes

Kubernetes assigns QoS classes based on resource specifications:

  1. Guaranteed - Requests = Limits for all containers
  2. Burstable - At least one container has requests < limits
  3. BestEffort - No requests or limits defined

In production, aim for Guaranteed or Burstable QoS for critical workloads.

2. High Availability and Scalability

Horizontal Pod Autoscaling (HPA)

Implement HPA to automatically scale based on metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app-deployment
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

Pod Disruption Budgets (PDB)

Protect your applications during voluntary disruptions:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: app-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: critical-app

Multi-Zone Deployments

Distribute pods across availability zones using topology spread constraints:

apiVersion: apps/v1
kind: Deployment
metadata:
name: multi-zone-app
spec:
replicas: 6
template:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: multi-zone-app

3. Security Best Practices

Network Policies

Implement zero-trust networking with Network Policies:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-policy
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: database
ports:
- protocol: TCP
port: 5432

Pod Security Standards

Enforce security contexts and pod security standards:

apiVersion: v1
kind: Pod
metadata:
name: secure-pod
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: nginx:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL

RBAC Configuration

Implement least-privilege access control:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: production
subjects:
- kind: ServiceAccount
name: app-service-account
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io

4. Monitoring and Observability

Health Checks

Implement comprehensive health checks:

apiVersion: v1
kind: Pod
metadata:
name: health-check-demo
spec:
containers:
- name: app
image: myapp:latest
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
startupProbe:
httpGet:
path: /startup
port: 8080
failureThreshold: 30
periodSeconds: 10

Structured Logging

Use structured logging for better observability:

// Example in Go
log.WithFields(log.Fields{
"user_id": userID,
"action": "login",
"status": "success",
"duration_ms": duration,
}).Info("User logged in")

5. Deployment Strategies

Rolling Updates with Safety Checks

Configure safe rolling updates:

apiVersion: apps/v1
kind: Deployment
metadata:
name: safe-deployment
spec:
replicas: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
minReadySeconds: 30
progressDeadlineSeconds: 600
template:
spec:
containers:
- name: app
image: myapp:v2

Blue-Green Deployments

Use service selectors for blue-green deployments:

# Blue deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
---
# Service pointing to blue
apiVersion: v1
kind: Service
metadata:
name: app-service
spec:
selector:
app: myapp
version: blue # Switch to 'green' for cutover
ports:
- port: 80

6. Storage and Persistence

Using StatefulSets for Stateful Applications

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: database
spec:
serviceName: "db"
replicas: 3
selector:
matchLabels:
app: database
template:
spec:
containers:
- name: postgres
image: postgres:14
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "fast-ssd"
resources:
requests:
storage: 100Gi

7. Configuration Management

Using ConfigMaps and Secrets Properly

apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
database.host: "db.production.svc.cluster.local"
cache.enabled: "true"
---
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
type: Opaque
stringData:
database.password: "encrypted-password"
api.key: "encrypted-api-key"

Mount as environment variables or volumes:

spec:
containers:
- name: app
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: app-secrets

8. Backup and Disaster Recovery

Regular Backup Strategy

  1. etcd Backups - Automate daily etcd snapshots
  2. Persistent Volume Backups - Use tools like Velero
  3. Configuration Backups - Version control all YAML files

Example Velero Backup Schedule

apiVersion: velero.io/v1
kind: Schedule
metadata:
name: daily-backup
spec:
schedule: "0 2 * * *"
template:
includedNamespaces:
- production
- staging
snapshotVolumes: true
ttl: 720h0m0s

9. Cost Optimization

Resource Optimization Tips

  1. Use Cluster Autoscaler for node-level scaling
  2. Implement Pod Priority Classes for critical workloads
  3. Use Spot/Preemptible Instances for non-critical workloads
  4. Monitor and Right-Size resources based on actual usage
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000
globalDefault: false
description: "High priority for critical services"

10. GitOps and CI/CD

Implementing GitOps with ArgoCD

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: production-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/yourorg/k8s-manifests
targetRevision: main
path: production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

Conclusion

Running Kubernetes in production is a journey that requires continuous learning and improvement. By implementing these best practices, you'll build a solid foundation for reliable, secure, and scalable applications.

Key Takeaways

✅ Always set resource requests and limits
✅ Implement comprehensive monitoring and alerting
✅ Use RBAC and network policies for security
✅ Automate scaling with HPA and cluster autoscaler
✅ Regular backups and disaster recovery testing
✅ Adopt GitOps for declarative infrastructure
✅ Continuous optimization based on metrics

Remember: Production readiness is not a destination but a continuous process of refinement and adaptation to your organization's needs.


What's your experience with Kubernetes in production? Share your challenges and solutions in the comments below!