Question 22 · Section 14

How to Organize Rolling Update in Kubernetes?

Imagine changing tires on a car. Rolling Update is like changing tires one at a time while the car moves slowly. The car never fully stops. The alternative (Recreate) — remove a...

Language versions: English Russian Ukrainian

Junior Level

Simple Definition

Rolling Update is a deployment strategy in Kubernetes that replaces old Pods with new ones gradually, without application downtime. Instead of killing all old Pods and creating new ones at once, Kubernetes does this one by one (or in small batches), waiting for each new Pod to be ready before removing an old one.

Rolling update – default deployment strategy in K8s. K8s gradually replaces old Pods with new ones, without application downtime.

Analogy

Imagine changing tires on a car. Rolling Update is like changing tires one at a time while the car moves slowly. The car never fully stops. The alternative (Recreate) — remove all 4 tires at once, and the car sits until new ones are on.

YAML Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # maximum 1 extra Pod
      maxUnavailable: 0  # zero Pods unavailable
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: app
          image: my-app:2.0  # new image
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            periodSeconds: 5

kubectl Example

# Update image
kubectl set image deployment/my-app app=my-app:2.0

# Track rollout progress
kubectl rollout status deployment/my-app

# View revision history
kubectl rollout history deployment/my-app

# Rollback to previous version
kubectl rollout undo deployment/my-app

# Rollback to specific revision
kubectl rollout undo deployment/my-app --to-revision=3

# Pause the rollout
kubectl rollout pause deployment/my-app

# Resume the rollout
kubectl rollout resume deployment/my-app

When to Use

  • Application version updates without downtime
  • Canary deployment (gradual traffic switching)
  • When the application maintains backward API compatibility
  • For stateless services (APIs, web applications)

Middle Level

How it Works

When you change the container image in a Deployment, Kubernetes:

  1. Creates a new ReplicaSet with the new image (old ReplicaSet is preserved)
  2. Starts creating Pods in the new ReplicaSet, guided by maxSurge
  3. Waits for each new Pod to pass Readiness Probe
  4. After success, removes a Pod from the old ReplicaSet, guided by maxUnavailable
  5. Repeats until all Pods are in the new ReplicaSet
  6. Old ReplicaSet is preserved (with replicas=0) for rollback

Parameters maxSurge and maxUnavailable:

  • maxSurge: 25% — with 4 replicas, can create 1 extra Pod (4 + 1 = 5)
  • maxUnavailable: 25% — with 4 replicas, at least 3 must be available (4 - 1 = 3)

Can be specified as percentages or absolute numbers: maxSurge: 1, maxUnavailable: 0.

// maxUnavailable=1 -- maximum 1 Pod can be unavailable during update.
// maxSurge=1 -- maximum 1 extra Pod can be created above desired.
// With replicas=3: 3 → 4(new) → 2(old)+2(new) → 1(old)+3(new) → 3(new)

Practical Scenarios

Scenario 1: Zero Downtime for 1 replica

replicas: 1
strategy:
  rollingUpdate:
    maxSurge: 1
    maxUnavailable: 0

Kubernetes creates a new Pod (total 2), waits for Readiness, then removes the old one.

Scenario 2: Fast deployment for 100 replicas

replicas: 100
strategy:
  rollingUpdate:
    maxSurge: 25%     # +25 Pods at once
    maxUnavailable: 25%  # -25 Pods at once

All 100 Pods update in 4 waves (25 at a time). Fast, but with temporary availability reduction.

Scenario 3: Pause/Resume for manual verification

# Deploy first 2 Pods and pause
kubectl set image deployment/my-app app=my-app:2.0
kubectl rollout pause deployment/my-app
# Check logs, metrics, tests
kubectl rollout resume deployment/my-app

Common Mistakes Table

Mistake Consequence Solution
Missing Readiness Probe Kubernetes kills old Pods, new ones not ready → downtime Always add readinessProbe
Using :latest tag Kubernetes doesn’t see image changes, Rolling Update doesn’t start Always use specific tags (:1.0, :2.0) or SHA256 digest
maxUnavailable: 0 with many replicas Update takes very long (one Pod at a time) Use maxUnavailable: 10-25% for speed/availability balance
maxSurge too large Temporary resource quota exceeded, OOMKill on nodes Limit maxSurge to available node resources
No Graceful Shutdown Active requests interrupted on Pod removal Handle SIGTERM, use preStop hook
API backward incompatibility New Pods can’t work with old data/DB schema Use backward-compatible migrations or Blue-Green
Recreate strategy instead of RollingUpdate Full downtime during update Check strategy.type: RollingUpdate

Deployment Strategies Comparison

Characteristic Rolling Update Recreate Blue-Green Canary
Downtime Zero Yes Zero Zero
Resources +maxSurge 0 x2 (full second set) +canary%
Speed Medium Fast Fast Slow
Rollback Fast (rollout undo) Fast Instant (traffic switch) Instant
Complexity Low (default) Low High (2 Deployments + Ingress) High (Ingress weight)
Risk Medium (two versions coexist) High Low Low
When to use Stateless APIs, microservices Dev, tests, stateful with migration Critical services, compliance Production with monitoring

When NOT to Use

  • Stateful applications with DB migration — Rolling Update runs old and new Pods simultaneously. If DB schema is incompatible, use Recreate + migration, or Operator

Rolling update is NOT suitable for stateful applications (databases) – new Pods have no data. For stateful, use StatefulSet with ordered updates.

  • Applications without backward compatibility — new version breaks API for old clients. Use Blue-Green
  • Very large Deployments with limited resourcesmaxSurge may require more CPU/RAM than nodes have
  • Test environments — use Recreate for speed

Senior Level

Deep Mechanics: Deployment Controller, ReplicaSet, and Reconciliation

Deployment Controller Architecture: Deployment Controller (in kube-controller-manager) works by standard reconciliation loop:

  1. Watch: Subscribes to Deployment, ReplicaSet, Pod events via informers
  2. Sync: On Deployment change (new image), the controller:
    • Creates a new ReplicaSet with new PodTemplateSpec
    • Computes desired replica count for new and old RS
    • Scales new RS up, old RS down
  3. Scaling Logic:
    desiredReplicas = spec.replicas
    maxSurge = ceil(desiredReplicas * maxSurge%)
    maxUnavailable = floor(desiredReplicas * maxUnavailable%)
    
    For each wave:
      newRS_replicas = min(desiredReplicas + maxSurge - current_total, desiredReplicas)
      oldRS_replicas = max(current_total - newRS_replicas - maxUnavailable, 0)
    
  4. Pod Creation/Deletion: New ReplicaSet creates Pods through its own reconciliation loop. Pods go through scheduling → kubelet → container creation → Readiness Probe
  5. RS Scaling: When new Pods are Ready, Deployment Controller scales down the old RS replicas

Revision History: Each ReplicaSet retains annotation deployment.kubernetes.io/revision. Kubernetes stores the last N revisions (default 10, configurable via revisionHistoryLimit). This allows rollout undo — simply scale the old RS back up.

Readiness Probe Integration: Deployment Controller doesn’t kill old Pods until new ones transition to Ready: True. Pod status is updated by kubelet → API Server → Deployment Controller informer → reconciliation.

Progress Deadline: spec.progressDeadlineSeconds (default 600 seconds). If Rolling Update doesn’t complete within this time (e.g., new Pods don’t pass Readiness), Deployment enters ProgressDeadlineExceeded status. This is not a rollback — Deployment stays in intermediate state, requires manual intervention.

Trade-offs

Aspect Trade-off
High vs low maxSurge High = faster update, but more resources and higher risk of two-version coexistence. Low = slower but safer
High vs low maxUnavailable High = faster, but temporary availability reduction. Low = zero downtime, but slower
Rolling Update vs Blue-Green Rolling = fewer resources, but two versions coexist. Blue-Green = instant switch, but x2 resources
High vs low RevisionHistoryLimit High = more history for rollback, but more etcd storage and old RS. Low = savings, but limited rollback
ProgressDeadlineSeconds Long = more time for debug, but slower problem detection. Short = fast failure detection, but false positives

Edge Cases (6+)

Edge Case 1: Two code versions coexist During Rolling Update, old (v1) and new (v2) versions run simultaneously. If v2 writes data in a format v1 doesn’t understand, corruption or errors are possible. Solution: backward-compatible API, feature flags, or Blue-Green deployment.

Edge Case 2: Database migration incompatibility Rolling Update starts Pods with new code that require new DB schema. But old Pods (v1) are still running and don’t know the new schema. Solution: run DB migrations BEFORE Rolling Update, in a backward-compatible manner (add column, don’t remove old one).

Edge Case 3: Resource quota exceeded during maxSurge Deployment with 100 replicas, maxSurge: 25%. During update, 125 Pods are needed. Namespace ResourceQuota limits to 110 Pods. New Pods stuck in Pending, Rolling Update hangs for 10 minutes (progressDeadlineSeconds). Solution: increase quota, or reduce maxSurge.

Edge Case 4: Pod Disruption Budget (PDB) conflict PDB sets minAvailable: 80%. Rolling Update with maxUnavailable: 25% wants to kill 25 out of 100 Pods. PDB allows killing only 20. Deployment Controller waits for PDB to allow it. If PDB and maxUnavailable are incompatible, Rolling Update hangs.

Edge Case 5: Node drain during Rolling Update Admin runs kubectl drain node-X. Pods on this node relocate. Simultaneously, a Rolling Update is in progress. Deployment Controller scales the new RS, but new Pods may be scheduled on a draining node → eviction → re-schedule. This slows the update and creates unnecessary churn.

Edge Case 6: Image Pull Backoff New image my-app:2.0 has a typo in registry URL or doesn’t exist. New Pods enter ImagePullBackOff. Deployment Controller waits for Readiness, but Pods don’t even start. After progressDeadlineSeconds (10 minutes), Deployment enters ProgressDeadlineExceeded. Solution: pre-deployment image validation, use ImagePullPolicy: IfNotPresent.

Edge Case 7: HPA + Rolling Update race condition HPA sees CPU growth during Rolling Update (new Pods not yet Ready, old ones handle more traffic). HPA scales Deployment to 120 replicas. Rolling Update now updates 120 Pods instead of 100. This may worsen resource contention. Solution: behavior.stabilizationWindowSeconds in HPA.

Edge Case 8: Sidecar container not updated Deployment has 2 containers: app and sidecar. Only app image is updated. Kubernetes creates a new ReplicaSet with new app image and old sidecar image. If sidecar needs updating for compatibility with new app, errors are possible. Solution: update all containers in one Deployment update.

Performance Numbers

Metric Value
ReplicaSet creation latency 10-50ms (API Server)
Pod scheduling latency 100ms-2s (depends on cluster size)
Container startup (Java/Spring Boot) 30-180 seconds
Readiness Probe success latency 5-30 seconds (after startup)
Rolling Update 100 replicas (maxSurge 25%) ~5-15 minutes (depends on startup time)
Rolling Update 100 replicas (maxSurge 100%) ~2-5 minutes (faster, but more resources)
Rollout undo latency 30 seconds - 5 minutes (depends on startup)
etcd storage per ReplicaSet ~5-10KB (annotations + spec)
Max revisionHistoryLimit Practically 50+ (etcd storage limitation)

Security

  • Image signing and verification — Rolling Update loads new images. Ensure images are signed (cosign, Notary) and verified via Admission Webhook
  • RBAC on rollout undo — Rollback may load an old, potentially vulnerable version. Restrict rollout undo via RBAC
  • ImagePullSecrets — Updated images must be accessible from private registry. Ensure ImagePullSecrets are current in all namespaces
  • Secret rotation — Rolling Update is a good time for Secret rotation. New Pods get updated Secrets from ConfigMap/Secret volume mounts
  • NetworkPolicy — During Rolling Update, two application versions communicate with the same services. NetworkPolicy must allow traffic for both versions

Production War Story

Situation: E-commerce platform, 200 API replicas, Rolling Update with maxSurge: 25%, maxUnavailable: 0. Deploying v2.3 with new DB schema (added column user_preferences).

What happened:

  1. DB migration executed: added column user_preferences (NOT NULL without default)
  2. Rolling Update started: first 50 Pods (v2.3) started successfully
  3. Remaining 150 Pods (v2.2) started crashing — ORM couldn’t map new schema (NOT NULL column, but old entities don’t know about it)
  4. Liveness Probe restarted v2.2 Pods endlessly
  5. 50 v2.3 Pods handled 100% of traffic → latency grew from 200ms to 5 seconds
  6. HPA scaled to 300 replicas, but new Pods were also v2.3 (image already updated in RS)
  7. Database connection pool exhausted → complete outage for 40 minutes

Post-mortem and fix:

  1. DB migrations always backward-compatible: add column with DEFAULT NULL, not NOT NULL. Old code ignores new column, new code uses it
  2. Canary deployment instead of Rolling Update for DB-dependent changes: 10% traffic on v2.3, monitor errors, then full rollout
  3. PDB for protection: minAvailable: 70% — Rolling Update can’t kill more than 30% of Pods, even if it wants to
  4. Alert on restart rate: rate(kube_pod_container_status_restarts_total[5m]) > 0.1 — would have fired within 2 minutes
  5. Rollback playbook: kubectl rollout undo should be automated, not manual

Monitoring after fix:

# Alert: Deployment rollout stuck
kube_deployment_status_condition{condition="Progressing", status="false"} == 1

# Alert: Restart rate
rate(kube_pod_container_status_restarts_total[5m]) > 0.1

# Alert: HPA scaling anomaly
kube_horizontalpodautoscaler_status_current_replicas - kube_horizontalpodautoscaler_spec_max_replicas > 0

# Alert: DB connection pool
db_connection_pool_active / db_connection_pool_max > 0.8

Monitoring (Prometheus/Grafana)

Key metrics:

# Deployment rollout status
kube_deployment_status_observed_generation
kube_deployment_status_replicas_available
kube_deployment_status_replicas_updated

# Rollout progress
kube_deployment_status_condition{condition="Progressing"}

# Pod restart rate
rate(kube_pod_container_status_restarts_total[5m]) by (deployment)

# Rolling Update duration
time() - kube_deployment_metadata_generation

# HPA scaling events
kube_horizontalpodautoscaler_status_current_replicas
kube_horizontalpodautoscaler_status_desired_replicas

# Pod readiness
kube_pod_status_ready{condition="true"} / kube_pod_status_ready

Grafana Dashboard panels:

  1. Deployment rollout progress: available vs updated replicas over time
  2. Rolling Update duration — from start to completion
  3. Pod restart rate — crash loop detection
  4. HPA scaling — correlation with Rolling Update
  5. Error rate (5xx) — correlation with deployments
  6. Latency p50/p99 — degradation detection during update
  7. DB connection pool usage — exhaustion detection

Highload Best Practices

  1. Always use maxUnavailable: 0 for critical services — Zero Downtime is mandatory
  2. maxSurge: 25-50% for speed/resources balance — faster than one at a time, but doesn’t overload nodes
  3. Readiness Probe — mandatory — without it, Kubernetes doesn’t know when new Pods are ready
  4. Graceful Shutdown + preStop hook:
    lifecycle:
      preStop:
        exec:
          command: ["sh", "-c", "sleep 10"]
    terminationGracePeriodSeconds: 60
    
  5. Database migrations backward-compatible — add columns with DEFAULT, never drop/rename
  6. Canary via Ingress annotationscanary-weight: "10" for first 10% of traffic
  7. PDB for protection from aggressive Rolling Update:
    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: my-app-pdb
    spec:
      minAvailable: 70%
      selector:
        matchLabels:
          app: my-app
    
  8. Monitor rollout duration — alert if Rolling Update > 15 minutes
  9. Automated rollback — on error rate > 5%, automatically rollout undo
  10. ProgressDeadlineSeconds: 600 — 10 minutes is enough for Java apps with long startup, but not too long for problem detection
  11. Image digest instead of tagsmy-app@sha256:abc123 instead of my-app:2.3 for reproducibility
  12. HPA stabilizationWindowSeconds: 300 — prevents race condition between Rolling Update and HPA

Interview Cheat Sheet

Must know:

  • Rolling Update — gradual Pod replacement without downtime (default strategy in Deployment)
  • maxSurge — how many extra Pods to create, maxUnavailable — how many to remove
  • On image change, a new ReplicaSet is created; old one preserved for rollback
  • Readiness Probe mandatory — K8s doesn’t remove old Pods until new ones are ready
  • kubectl rollout undo — quick rollback; rollout pause/resume — manual check
  • Database migrations must be backward-compatible (add column with DEFAULT, not drop)
  • PDB can conflict with maxUnavailable — Rolling Update hangs

Common follow-up questions:

  • “Why is :latest tag bad for Rolling Update?” — K8s doesn’t see image changes, update doesn’t start
  • “Two code versions run simultaneously — is this a problem?” — Yes, if there’s no API backward compatibility
  • “Rolling Update for stateful applications?” — No, use StatefulSet with ordered updates
  • “Rolling Update stuck — what to do?” — Check: ImagePullBackOff, ResourceQuota, PDB, Readiness Probe

Red flags (DO NOT say):

  • “Rolling Update = Recreate” (Recreate = downtime, Rolling = zero downtime)
  • “I use :latest for convenience” (K8s won’t detect image change)
  • “DB migrations after deployment” (migrations BEFORE, backward-compatible)
  • “maxUnavailable: 0 for 100 replicas” (update will take very long)

Related topics:

  • [[What is ReplicaSet]] — update mechanism
  • [[What is readiness probe]] — blocks old Pod removal
  • [[What is StatefulSet and when to use it]] — rolling update for stateful