Kubernetes Best Practices for Production
Running Kubernetes in production requires careful planning and adherence to best practices. This guide covers everything you need to know to run reliable, secure, and scalable Kubernetes clusters.
Cluster Architecture
High Availability Setup
Always run multiple control plane nodes:
apiVersion: kubeadm.k8s.io/v1beta3 kind: ClusterConfiguration controlPlaneEndpoint: "lb.example.com:6443" etcd: local: dataDir: /var/lib/etcd extraArgs: initial-cluster-state: new
Node Configuration
Separate workloads using node pools:
apiVersion: v1 kind: Node metadata: name: worker-1 labels: node-role: compute workload-type: cpu-intensive spec: taints: - key: workload-type value: cpu-intensive effect: NoSchedule
Resource Management
Resource Requests and Limits
Always set resource constraints:
apiVersion: v1 kind: Pod metadata: name: myapp spec: containers: - name: app image: myapp:latest resources: requests: memory: "256Mi" cpu: "500m" limits: memory: "512Mi" cpu: "1000m"
Quality of Service Classes
Understand QoS classes for better scheduling:
- Guaranteed: Requests = Limits
- Burstable: Requests < Limits
- BestEffort: No requests/limits
Security Best Practices
RBAC Configuration
Implement least-privilege access:
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: pod-reader rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "list"]
Network Policies
Isolate pods with network policies:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: api-allow spec: podSelector: matchLabels: app: api policyTypes: - Ingress ingress: - from: - podSelector: matchLabels: app: frontend ports: - protocol: TCP port: 8080
Pod Security Standards
Enforce security standards:
apiVersion: v1 kind: Namespace metadata: name: production labels: pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restricted
Health Checks
Liveness and Readiness Probes
Implement proper health checks:
apiVersion: v1 kind: Pod spec: containers: - name: app livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5
Deployment Strategies
Rolling Updates
Configure safe rolling updates:
apiVersion: apps/v1 kind: Deployment spec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 25% maxUnavailable: 25% minReadySeconds: 10
Blue-Green Deployments
Use services for blue-green deployments:
apiVersion: v1 kind: Service metadata: name: myapp spec: selector: app: myapp version: blue # Switch to green when ready
Monitoring and Logging
Prometheus Integration
Monitor with Prometheus:
apiVersion: v1 kind: ServiceMonitor metadata: name: myapp spec: selector: matchLabels: app: myapp endpoints: - port: metrics interval: 30s
Centralized Logging
Use EFK or Loki for logging:
apiVersion: v1 kind: ConfigMap metadata: name: fluent-bit-config data: fluent-bit.conf: | [OUTPUT] Name es Match * Host elasticsearch Port 9200
Backup and Disaster Recovery
etcd Backups
Regular etcd snapshots:
ETCDCTL_API=3 etcdctl snapshot save backup.db --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key
Velero for Application Backups
Use Velero for application-level backups:
velero backup create myapp-backup --include-namespaces myapp --storage-location default
Cost Optimization
Resource Optimization
- Use Horizontal Pod Autoscaling (HPA)
- Implement Vertical Pod Autoscaling (VPA)
- Use cluster autoscaling
- Implement resource quotas
Cluster Autoscaler
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: myapp spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70
Conclusion
Running Kubernetes in production is complex but manageable with the right practices. Focus on security, reliability, and observability from day one.
Remember: Start simple, monitor everything, and iterate based on your actual needs.