How to Organize Rolling Update in Kubernetes?

Junior Level

Simple Definition

Rolling Update is a deployment strategy in Kubernetes that replaces old Pods with new ones gradually, without application downtime. Instead of killing all old Pods and creating new ones at once, Kubernetes does this one by one (or in small batches), waiting for each new Pod to be ready before removing an old one.

Rolling update – default deployment strategy in K8s. K8s gradually replaces old Pods with new ones, without application downtime.

Analogy

Imagine changing tires on a car. Rolling Update is like changing tires one at a time while the car moves slowly. The car never fully stops. The alternative (Recreate) — remove all 4 tires at once, and the car sits until new ones are on.

YAML Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # maximum 1 extra Pod
      maxUnavailable: 0  # zero Pods unavailable
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: app
          image: my-app:2.0  # new image
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            periodSeconds: 5

kubectl Example

# Update image
kubectl set image deployment/my-app app=my-app:2.0

# Track rollout progress
kubectl rollout status deployment/my-app

# View revision history
kubectl rollout history deployment/my-app

# Rollback to previous version
kubectl rollout undo deployment/my-app

# Rollback to specific revision
kubectl rollout undo deployment/my-app --to-revision=3

# Pause the rollout
kubectl rollout pause deployment/my-app

# Resume the rollout
kubectl rollout resume deployment/my-app

When to Use

Application version updates without downtime
Canary deployment (gradual traffic switching)
When the application maintains backward API compatibility
For stateless services (APIs, web applications)

Middle Level

How it Works

When you change the container image in a Deployment, Kubernetes:

Creates a new ReplicaSet with the new image (old ReplicaSet is preserved)
Starts creating Pods in the new ReplicaSet, guided by maxSurge
Waits for each new Pod to pass Readiness Probe
After success, removes a Pod from the old ReplicaSet, guided by maxUnavailable
Repeats until all Pods are in the new ReplicaSet
Old ReplicaSet is preserved (with replicas=0) for rollback

Parameters maxSurge and maxUnavailable:

maxSurge: 25% — with 4 replicas, can create 1 extra Pod (4 + 1 = 5)
maxUnavailable: 25% — with 4 replicas, at least 3 must be available (4 - 1 = 3)

Can be specified as percentages or absolute numbers: maxSurge: 1, maxUnavailable: 0.

// maxUnavailable=1 -- maximum 1 Pod can be unavailable during update.
// maxSurge=1 -- maximum 1 extra Pod can be created above desired.
// With replicas=3: 3 → 4(new) → 2(old)+2(new) → 1(old)+3(new) → 3(new)

Practical Scenarios

Scenario 1: Zero Downtime for 1 replica

replicas: 1
strategy:
  rollingUpdate:
    maxSurge: 1
    maxUnavailable: 0

Kubernetes creates a new Pod (total 2), waits for Readiness, then removes the old one.

Scenario 2: Fast deployment for 100 replicas

replicas: 100
strategy:
  rollingUpdate:
    maxSurge: 25%     # +25 Pods at once
    maxUnavailable: 25%  # -25 Pods at once

All 100 Pods update in 4 waves (25 at a time). Fast, but with temporary availability reduction.

Scenario 3: Pause/Resume for manual verification

# Deploy first 2 Pods and pause
kubectl set image deployment/my-app app=my-app:2.0
kubectl rollout pause deployment/my-app
# Check logs, metrics, tests
kubectl rollout resume deployment/my-app

Common Mistakes Table

Mistake	Consequence	Solution
Missing Readiness Probe	Kubernetes kills old Pods, new ones not ready → downtime	Always add readinessProbe
Using `:latest` tag	Kubernetes doesn’t see image changes, Rolling Update doesn’t start	Always use specific tags (`:1.0`, `:2.0`) or SHA256 digest
`maxUnavailable: 0` with many replicas	Update takes very long (one Pod at a time)	Use `maxUnavailable: 10-25%` for speed/availability balance
`maxSurge` too large	Temporary resource quota exceeded, OOMKill on nodes	Limit `maxSurge` to available node resources
No Graceful Shutdown	Active requests interrupted on Pod removal	Handle SIGTERM, use preStop hook
API backward incompatibility	New Pods can’t work with old data/DB schema	Use backward-compatible migrations or Blue-Green
Recreate strategy instead of RollingUpdate	Full downtime during update	Check `strategy.type: RollingUpdate`

Deployment Strategies Comparison

Characteristic	Rolling Update	Recreate	Blue-Green	Canary
Downtime	Zero	Yes	Zero	Zero
Resources	+maxSurge	0	x2 (full second set)	+canary%
Speed	Medium	Fast	Fast	Slow
Rollback	Fast (rollout undo)	Fast	Instant (traffic switch)	Instant
Complexity	Low (default)	Low	High (2 Deployments + Ingress)	High (Ingress weight)
Risk	Medium (two versions coexist)	High	Low	Low
When to use	Stateless APIs, microservices	Dev, tests, stateful with migration	Critical services, compliance	Production with monitoring

When NOT to Use

Stateful applications with DB migration — Rolling Update runs old and new Pods simultaneously. If DB schema is incompatible, use Recreate + migration, or Operator

Rolling update is NOT suitable for stateful applications (databases) – new Pods have no data. For stateful, use StatefulSet with ordered updates.

Applications without backward compatibility — new version breaks API for old clients. Use Blue-Green
Very large Deployments with limited resources — maxSurge may require more CPU/RAM than nodes have
Test environments — use Recreate for speed

Senior Level

Deep Mechanics: Deployment Controller, ReplicaSet, and Reconciliation

Deployment Controller Architecture: Deployment Controller (in kube-controller-manager) works by standard reconciliation loop:

Watch: Subscribes to Deployment, ReplicaSet, Pod events via informers
Sync: On Deployment change (new image), the controller:
- Creates a new ReplicaSet with new PodTemplateSpec
- Computes desired replica count for new and old RS
- Scales new RS up, old RS down

Scaling Logic:

desiredReplicas = spec.replicas
maxSurge = ceil(desiredReplicas * maxSurge%)
maxUnavailable = floor(desiredReplicas * maxUnavailable%)

For each wave:
  newRS_replicas = min(desiredReplicas + maxSurge - current_total, desiredReplicas)
  oldRS_replicas = max(current_total - newRS_replicas - maxUnavailable, 0)

Pod Creation/Deletion: New ReplicaSet creates Pods through its own reconciliation loop. Pods go through scheduling → kubelet → container creation → Readiness Probe
RS Scaling: When new Pods are Ready, Deployment Controller scales down the old RS replicas

Revision History: Each ReplicaSet retains annotation deployment.kubernetes.io/revision. Kubernetes stores the last N revisions (default 10, configurable via revisionHistoryLimit). This allows rollout undo — simply scale the old RS back up.

Readiness Probe Integration: Deployment Controller doesn’t kill old Pods until new ones transition to Ready: True. Pod status is updated by kubelet → API Server → Deployment Controller informer → reconciliation.

Progress Deadline: spec.progressDeadlineSeconds (default 600 seconds). If Rolling Update doesn’t complete within this time (e.g., new Pods don’t pass Readiness), Deployment enters ProgressDeadlineExceeded status. This is not a rollback — Deployment stays in intermediate state, requires manual intervention.

Trade-offs

Aspect	Trade-off
High vs low maxSurge	High = faster update, but more resources and higher risk of two-version coexistence. Low = slower but safer
High vs low maxUnavailable	High = faster, but temporary availability reduction. Low = zero downtime, but slower
Rolling Update vs Blue-Green	Rolling = fewer resources, but two versions coexist. Blue-Green = instant switch, but x2 resources
High vs low RevisionHistoryLimit	High = more history for rollback, but more etcd storage and old RS. Low = savings, but limited rollback
ProgressDeadlineSeconds	Long = more time for debug, but slower problem detection. Short = fast failure detection, but false positives

Edge Cases (6+)

Edge Case 1: Two code versions coexist During Rolling Update, old (v1) and new (v2) versions run simultaneously. If v2 writes data in a format v1 doesn’t understand, corruption or errors are possible. Solution: backward-compatible API, feature flags, or Blue-Green deployment.

Edge Case 2: Database migration incompatibility Rolling Update starts Pods with new code that require new DB schema. But old Pods (v1) are still running and don’t know the new schema. Solution: run DB migrations BEFORE Rolling Update, in a backward-compatible manner (add column, don’t remove old one).

Edge Case 3: Resource quota exceeded during maxSurge Deployment with 100 replicas, maxSurge: 25%. During update, 125 Pods are needed. Namespace ResourceQuota limits to 110 Pods. New Pods stuck in Pending, Rolling Update hangs for 10 minutes (progressDeadlineSeconds). Solution: increase quota, or reduce maxSurge.

Edge Case 4: Pod Disruption Budget (PDB) conflict PDB sets minAvailable: 80%. Rolling Update with maxUnavailable: 25% wants to kill 25 out of 100 Pods. PDB allows killing only 20. Deployment Controller waits for PDB to allow it. If PDB and maxUnavailable are incompatible, Rolling Update hangs.

Edge Case 5: Node drain during Rolling Update Admin runs kubectl drain node-X. Pods on this node relocate. Simultaneously, a Rolling Update is in progress. Deployment Controller scales the new RS, but new Pods may be scheduled on a draining node → eviction → re-schedule. This slows the update and creates unnecessary churn.

Edge Case 6: Image Pull Backoff New image my-app:2.0 has a typo in registry URL or doesn’t exist. New Pods enter ImagePullBackOff. Deployment Controller waits for Readiness, but Pods don’t even start. After progressDeadlineSeconds (10 minutes), Deployment enters ProgressDeadlineExceeded. Solution: pre-deployment image validation, use ImagePullPolicy: IfNotPresent.

Edge Case 7: HPA + Rolling Update race condition HPA sees CPU growth during Rolling Update (new Pods not yet Ready, old ones handle more traffic). HPA scales Deployment to 120 replicas. Rolling Update now updates 120 Pods instead of 100. This may worsen resource contention. Solution: behavior.stabilizationWindowSeconds in HPA.

Edge Case 8: Sidecar container not updated Deployment has 2 containers: app and sidecar. Only app image is updated. Kubernetes creates a new ReplicaSet with new app image and old sidecar image. If sidecar needs updating for compatibility with new app, errors are possible. Solution: update all containers in one Deployment update.

Performance Numbers

Metric	Value
ReplicaSet creation latency	10-50ms (API Server)
Pod scheduling latency	100ms-2s (depends on cluster size)
Container startup (Java/Spring Boot)	30-180 seconds
Readiness Probe success latency	5-30 seconds (after startup)
Rolling Update 100 replicas (maxSurge 25%)	~5-15 minutes (depends on startup time)
Rolling Update 100 replicas (maxSurge 100%)	~2-5 minutes (faster, but more resources)
Rollout undo latency	30 seconds - 5 minutes (depends on startup)
etcd storage per ReplicaSet	~5-10KB (annotations + spec)
Max revisionHistoryLimit	Practically 50+ (etcd storage limitation)

Security

Image signing and verification — Rolling Update loads new images. Ensure images are signed (cosign, Notary) and verified via Admission Webhook
RBAC on rollout undo — Rollback may load an old, potentially vulnerable version. Restrict rollout undo via RBAC
ImagePullSecrets — Updated images must be accessible from private registry. Ensure ImagePullSecrets are current in all namespaces
Secret rotation — Rolling Update is a good time for Secret rotation. New Pods get updated Secrets from ConfigMap/Secret volume mounts
NetworkPolicy — During Rolling Update, two application versions communicate with the same services. NetworkPolicy must allow traffic for both versions

Production War Story

Situation: E-commerce platform, 200 API replicas, Rolling Update with maxSurge: 25%, maxUnavailable: 0. Deploying v2.3 with new DB schema (added column user_preferences).

What happened:

DB migration executed: added column user_preferences (NOT NULL without default)
Rolling Update started: first 50 Pods (v2.3) started successfully
Remaining 150 Pods (v2.2) started crashing — ORM couldn’t map new schema (NOT NULL column, but old entities don’t know about it)
Liveness Probe restarted v2.2 Pods endlessly
50 v2.3 Pods handled 100% of traffic → latency grew from 200ms to 5 seconds
HPA scaled to 300 replicas, but new Pods were also v2.3 (image already updated in RS)
Database connection pool exhausted → complete outage for 40 minutes

Post-mortem and fix:

DB migrations always backward-compatible: add column with DEFAULT NULL, not NOT NULL. Old code ignores new column, new code uses it
Canary deployment instead of Rolling Update for DB-dependent changes: 10% traffic on v2.3, monitor errors, then full rollout
PDB for protection: minAvailable: 70% — Rolling Update can’t kill more than 30% of Pods, even if it wants to
Alert on restart rate: rate(kube_pod_container_status_restarts_total[5m]) > 0.1 — would have fired within 2 minutes
Rollback playbook: kubectl rollout undo should be automated, not manual

Monitoring after fix:

# Alert: Deployment rollout stuck
kube_deployment_status_condition{condition="Progressing", status="false"} == 1

# Alert: Restart rate
rate(kube_pod_container_status_restarts_total[5m]) > 0.1

# Alert: HPA scaling anomaly
kube_horizontalpodautoscaler_status_current_replicas - kube_horizontalpodautoscaler_spec_max_replicas > 0

# Alert: DB connection pool
db_connection_pool_active / db_connection_pool_max > 0.8

Monitoring (Prometheus/Grafana)

Key metrics:

# Deployment rollout status
kube_deployment_status_observed_generation
kube_deployment_status_replicas_available
kube_deployment_status_replicas_updated

# Rollout progress
kube_deployment_status_condition{condition="Progressing"}

# Pod restart rate
rate(kube_pod_container_status_restarts_total[5m]) by (deployment)

# Rolling Update duration
time() - kube_deployment_metadata_generation

# HPA scaling events
kube_horizontalpodautoscaler_status_current_replicas
kube_horizontalpodautoscaler_status_desired_replicas

# Pod readiness
kube_pod_status_ready{condition="true"} / kube_pod_status_ready

Grafana Dashboard panels:

Deployment rollout progress: available vs updated replicas over time
Rolling Update duration — from start to completion
Pod restart rate — crash loop detection
HPA scaling — correlation with Rolling Update
Error rate (5xx) — correlation with deployments
Latency p50/p99 — degradation detection during update
DB connection pool usage — exhaustion detection

Highload Best Practices

Always use maxUnavailable: 0 for critical services — Zero Downtime is mandatory
maxSurge: 25-50% for speed/resources balance — faster than one at a time, but doesn’t overload nodes
Readiness Probe — mandatory — without it, Kubernetes doesn’t know when new Pods are ready

Graceful Shutdown + preStop hook:

lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 10"]
terminationGracePeriodSeconds: 60

Database migrations backward-compatible — add columns with DEFAULT, never drop/rename
Canary via Ingress annotations — canary-weight: "10" for first 10% of traffic

PDB for protection from aggressive Rolling Update:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 70%
  selector:
    matchLabels:
      app: my-app

Monitor rollout duration — alert if Rolling Update > 15 minutes
Automated rollback — on error rate > 5%, automatically rollout undo
ProgressDeadlineSeconds: 600 — 10 minutes is enough for Java apps with long startup, but not too long for problem detection
Image digest instead of tags — my-app@sha256:abc123 instead of my-app:2.3 for reproducibility
HPA stabilizationWindowSeconds: 300 — prevents race condition between Rolling Update and HPA

Interview Cheat Sheet

Must know:

Rolling Update — gradual Pod replacement without downtime (default strategy in Deployment)
maxSurge — how many extra Pods to create, maxUnavailable — how many to remove
On image change, a new ReplicaSet is created; old one preserved for rollback
Readiness Probe mandatory — K8s doesn’t remove old Pods until new ones are ready
kubectl rollout undo — quick rollback; rollout pause/resume — manual check
Database migrations must be backward-compatible (add column with DEFAULT, not drop)
PDB can conflict with maxUnavailable — Rolling Update hangs

Common follow-up questions:

“Why is :latest tag bad for Rolling Update?” — K8s doesn’t see image changes, update doesn’t start
“Two code versions run simultaneously — is this a problem?” — Yes, if there’s no API backward compatibility
“Rolling Update for stateful applications?” — No, use StatefulSet with ordered updates
“Rolling Update stuck — what to do?” — Check: ImagePullBackOff, ResourceQuota, PDB, Readiness Probe

Red flags (DO NOT say):

“Rolling Update = Recreate” (Recreate = downtime, Rolling = zero downtime)
“I use :latest for convenience” (K8s won’t detect image change)
“DB migrations after deployment” (migrations BEFORE, backward-compatible)
“maxUnavailable: 0 for 100 replicas” (update will take very long)

Related topics:

[[What is ReplicaSet]] — update mechanism
[[What is readiness probe]] — blocks old Pod removal
[[What is StatefulSet and when to use it]] — rolling update for stateful