Question 19 · Section 14

Why are Health Checks Needed in Kubernetes?

Health checks in K8s -- a combination of liveness, readiness and startup probes. Each solves its own task: liveness = is it alive, readiness = is it ready, startup = has it star...

Language versions: English Russian Ukrainian

Junior Level

Simple Definition

Health Checks are Kubernetes mechanisms that allow the orchestrator to understand the state of an application: is it running, is it ready to accept traffic, and is it stuck. Based on these checks, Kubernetes automatically makes decisions: restart a container, remove it from load balancing, or give it more time to start.

Health checks in K8s – a combination of liveness, readiness and startup probes. Each solves its own task: liveness = is it alive, readiness = is it ready, startup = has it started.

Analogy

Imagine you manage a coffee shop chain. Health Checks are daily reports from each shop:

  • Liveness: “We’re open, lights are on” (container is alive)
  • Readiness: “Registers work, barista is in place, we can serve customers” (ready for traffic)
  • Startup: “We’re still renovating, opening soon” (give us time to start)

YAML Example (all three probes)

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
    - name: app
      image: my-app:1.0
      startupProbe:
        httpGet:
          path: /health/startup
          port: 8080
        failureThreshold: 30
        periodSeconds: 10
      livenessProbe:
        httpGet:
          path: /health/liveness
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 10
      readinessProbe:
        httpGet:
          path: /health/readiness
          port: 8080
        periodSeconds: 5
        failureThreshold: 3

kubectl Example

# Check Pod status
kubectl get pods
# READY: 1/1 — all probes passed, 0/1 — readiness failed

# Detailed probe information
kubectl describe pod my-app

# Check events (restarts due to liveness)
kubectl get events --sort-by='.lastTimestamp'

When to Use

  • Always in production — without health checks Kubernetes cannot automatically recover the application
  • For any microservices, APIs, web applications
  • For Java applications with long startup (JVM, Spring Boot) — startupProbe is mandatory

When NOT to use liveness probe

Do NOT use liveness probe for stateful applications (databases) – restart may worsen the problem. Use readiness probe + monitoring instead.


Middle Level

How it Works

The Kubernetes kubelet performs three types of checks independently of each other:

  1. Liveness Probe — kubelet polls the endpoint. On failure (failureThreshold consecutive times), the kubelet kills the container and the container runtime restarts it. Restart count increases.

  2. Readiness Probe — kubelet polls the endpoint. On failure, the Pod is removed from Endpoints of all related Services. The container continues running.

  3. Startup Probe — kubelet polls until first success. While startupProbe hasn’t succeeded, liveness and readiness are disabled. After the first success, startupProbe is no longer executed.

Each probe runs in a separate kubelet goroutine with a timeout. If the probe doesn’t respond within timeoutSeconds, it is considered failed.

Practical Scenarios

Scenario 1: Java application with long startup (3 minutes)

startupProbe:
  httpGet:
    path: /actuator/health
    port: 8080
  failureThreshold: 30     # 30 attempts
  periodSeconds: 10        # every 10 seconds = 5 minutes max
# Only after startupProbe success:
livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  periodSeconds: 10        # check every 10 seconds

Without startupProbe, you’d have to set initialDelaySeconds: 180 for liveness, meaning 3 minutes with no hang detection after startup.

Scenario 2: Temporary overload The application receives too many requests, queue overflows. Readiness Probe returns 503, Pod is removed from load balancing, processes accumulated tasks, and returns to service.

Scenario 3: Deadlock detection Liveness Probe checks not only HTTP but also internal state. When a deadlock is detected — the probe fails, the container is restarted.

Common Mistakes Table

Mistake Consequence Solution
All three probes on the same endpoint Cannot distinguish “frozen” from “not ready” Separate /health/liveness, /health/readiness, /health/startup
Liveness checks external DB When DB fails, all Pods restart endlessly Liveness checks only internal state, no external dependencies
Too aggressive failureThreshold Unnecessary restarts on short-term issues Set 3-5 for liveness, 15-30 for startup
Missing health checks Frozen Pods receive traffic, users see 502 Always add all three probes in production
timeoutSeconds too small (1 sec) Failures on GC pause or slow requests Set 3-10 seconds, considering worst-case app latency
Using :latest tag Kubernetes doesn’t see changes and doesn’t start Rolling Update Always use specific tags or SHA256 digest

Comparison of All Three Probes

Characteristic Startup Probe Liveness Probe Readiness Probe
Question “Have I started?” “Am I alive?” “Am I ready for traffic?”
Action on failure Retry (up to failureThreshold) Container restart Removal from Service Endpoints
When disabled After first success — forever Always runs (if no startupProbe) Always runs (if no startupProbe)
Interaction Blocks liveness and readiness Restarts Removes from load balancing
Typical periodSeconds 5-10 10-30 3-10
Typical failureThreshold 15-30 3-5 3
Checks dependencies? No No Yes

When NOT to Use

  • Short-lived Jobs/CronJobs — if the Pod starts, performs a task, and exits, health checks are not needed (or only for debugging)
  • Sidecar containers without network interface — if the container only writes logs or collects metrics, liveness may be enough, readiness — not
  • Init containers — they run before main containers and exit, health checks don’t apply to them

Senior Level

Deep Mechanics: kubelet probe manager

Health Checks are managed by the kubelet, specifically the prober subsystem in the kubelet source code (pkg/kubelet/prober/).

Architecture:

  1. worker — goroutine per probe, performs HTTP/TCP/exec checks
  2. manager — coordinates workers, stores results in results.Manager
  3. statusManager — passes results to PodStatus, which is sent to API Server

The kubelet executes probes asynchronously. Each worker has its own queue (workqueue) and timeout. On timeout, the probe is considered failed even if there’s no response.

Startup Probe + Liveness/Readiness interaction:

Pod created → startupProbe active, liveness/readiness disabled
→ kubelet executes startupProbe every periodSeconds
→ On success: startupProbe disabled forever, liveness/readiness activated
→ On failure (failureThreshold consecutive): container killed (since K8s 1.20+)

Before K8s 1.20, failure in startupProbe didn’t kill the container — this was fixed in PR #95190.

Trade-offs

Aspect Trade-off
Startup vs initialDelaySeconds startupProbe is more flexible: gives plenty of startup time, then fast liveness. initialDelaySeconds — either too long or too aggressive
Liveness frequency Frequent checks = faster deadlock detection, but higher CPU overhead and false positives on GC pause
HTTP vs exec HTTP is faster and gives status code, but exec can check deeper (files, processes). exec is slower and creates additional processes
Dependencies in Readiness Checking dependencies = honest status, but risk of cascading failure. Without checking = Pod gets traffic but can’t process it
Single vs Multi endpoint Single endpoint is simpler but less informative. Separate endpoints are more accurate but require more code and maintenance

Edge Cases (6+)

Edge Case 1: JVM GC Pause and Liveness Probe JVM can stop all threads for 5-30 seconds during Full GC (especially without ZGC/Shenandoah). If timeoutSeconds: 3 and failureThreshold: 3, three failed probes will kill the container. Solution: ZGC/G1GC with MaxGCPauseMillis, timeoutSeconds: 10, or a separate native sidecar for health checks outside the JVM.

Edge Case 2: Cascading Failure Through Readiness All microservices check the same DB in readinessProbe. DB slows down → all Pods simultaneously disconnect from traffic → complete outage. Solution: Readiness checks only internal state, and DB availability is handled via Circuit Breaker at the business logic level.

Edge Case 3: kubelet Restart Resets Probe State On kubelet restart, the consecutive failure counter for startupProbe is lost. A Pod close to failureThreshold gets a “clean slate”. This can cause a never-starting application to keep trying indefinitely.

Edge Case 4: Containerd/CRI-O Timeout If the kubelet can’t execute an exec probe due to CRI issues (containerd/CRI-O hung), the probe is considered failed. This isn’t an application problem, but the container will still be restarted. Monitor kubelet_pod_worker_duration_seconds.

Edge Case 5: Readiness + HPA Race Condition HPA checks CPU/RPS. Readiness removes the Pod from traffic. HPA sees RPS drop on the Pod and creates new replicas. But new ones may also fail Readiness. Result: Pod explosion without real benefit. Solution: behavior.stabilizationWindowSeconds in HPA + proper Readiness without external dependencies.

Edge Case 6: Liveness Probe + Graceful Shutdown Conflict Liveness Probe fails, kubelet kills the container. Simultaneously, the application receives SIGTERM and begins graceful shutdown (completes requests in 30 sec). But the kubelet has already started a new container. Two instances of the same application run in parallel, potential resource conflict (files, ports, DB connections).

Edge Case 7: Multi-container Pod with Different Probes Pod has 2 containers: app and sidecar (envoy). Liveness passes for app, fails for sidecar. kubelet restarts sidecar, but app keeps running. If sidecar is critical (e.g., service mesh proxy), app runs without proxy. Solution: use ShareProcessNamespace and check both containers.

Performance Numbers

Metric Value
kubelet probe overhead (HTTP) ~1-5ms CPU, ~10-50KB RAM per probe
kubelet probe overhead (exec) ~10-50ms CPU, new process creation
Probe timeout effect On timeout probe is failed, but goroutine waits for timeout before cleanup
API Server PodStatus update latency 10-100ms
EndpointSlice propagation delay 500ms-2s
kubelet restart probe state loss Complete loss of consecutive failure counter
JVM Full GC pause (G1GC) 100ms-2s, (ZGC) 1-10ms
Max probes per Pod Not limited, but each probe = goroutine, ~2KB stack

Security

  • Health endpoints must not be public — they may expose internal application structure
  • exec probes run as the container user — if the container runs as root, exec probe is also root. Not an escalation, but worth noting
  • Liveness Probe must not have side-effects — an attacker with Pod access can call the liveness endpoint and trigger a restart (DoS)
  • mTLS for health endpoints — if sidecar proxy (Istio) intercepts all traffic, health endpoints should be excluded from mTLS or use a separate port
  • NetworkPolicy — restrict access to health endpoints only from kubelet (via kube-system namespace)

Production War Story

Situation: Fintech company, 500-node cluster, 5000 Pods. All services had a single health endpoint /health that checked DB, Redis, and Kafka. During an incident, Redis started slowing down (99th percentile latency increased from 5ms to 500ms). Liveness Probe started timing out because Redis checks took 10+ seconds with timeoutSeconds: 5.

Domino effect:

  1. Liveness Probe failed → kubelet killed 2000 Pods in 2 minutes
  2. New Pods started, but startupProbe checked Redis → also timed out
  3. HPA saw replica drop → created 3000 more Pods
  4. Cluster hit Pod-per-node limit → new Pods in Pending
  5. Complete outage for 45 minutes

Post-mortem and fix:

  1. Endpoints separated: /health/live (no external dependencies), /health/ready (with Circuit Breaker)
  2. startupProbe removed Redis check — only JVM startup
  3. Added Redis Circuit Breaker with fallback to degraded mode
  4. HPA limited with maxReplicas and stabilizationWindowSeconds: 300
  5. Chaos Engineering introduced — regular failure scenario testing

Monitoring after fix:

# Alert: Liveness restarts increasing
rate(kube_pod_container_status_restarts_total[5m]) > 0.1

# Alert: Readiness failures
sum(kube_pod_status_ready{condition="false"}) by (namespace) > 10

# Alert: HPA scaling too aggressive
kube_horizontalpodautoscaler_status_desired_replicas - kube_horizontalpodautoscaler_spec_min_replicas > 5

Monitoring (Prometheus/Grafana)

Key metrics:

# Container restarts (liveness failures)
rate(kube_pod_container_status_restarts_total[5m])

# Pods not in Ready state
sum(kube_pod_status_ready{condition="false"}) by (namespace)

# kubelet probe latency (should be < periodSeconds)
histogram_quantile(0.99, kubelet_pod_worker_duration_seconds_bucket)

# Probe timeout events (via kubelet logs)
kubelet_pod_probe_failed_total

# HPA scaling events
kube_horizontalpodautoscaler_status_current_replicas

# EndpointSlice availability
sum(kube_endpoint_address_available) by (service)

Grafana Dashboard panels:

  1. Pod Restarts Rate (by namespace) — red line at > 0.1/sec
  2. Liveness vs Readiness failure rate — correlation
  3. Probe latency p50/p99 — should be stable
  4. HPA current vs desired replicas
  5. Correlate with infrastructure metrics: node CPU, memory, disk I/O

Highload Best Practices

  1. Always separate liveness and readiness endpoints — liveness = “process alive”, readiness = “can serve traffic”
  2. Use startupProbe for JVM applications — a lifesaver for Spring Boot with 2-3 minute startup
  3. Liveness without external dependencies — check only deadlock, thread starvation, critical memory shortage
  4. Readiness with Circuit Breaker — if dependency is unavailable, return Ready in degraded mode, don’t remove Pod
  5. timeoutSeconds >= p99 latency of your endpoint — account for worst-case with GC pause
  6. Do not use :latest — Kubernetes won’t detect the change and won’t start Rolling Update
  7. Graceful Shutdown + preStop hook — give time to complete requests:
    lifecycle:
      preStop:
        exec:
          command: ["sh", "-c", "sleep 10"]
    terminationGracePeriodSeconds: 60
    
  8. Monitor restart rate — more than 1 restart per minute per Pod = problem with liveness config or application
  9. Use Prometheus Operator + kube-state-metrics — for automatic health check metrics collection
  10. Chaos Engineering — regularly test scenarios: DB failure, network partition, node drain — and check how health checks handle them

Interview Cheat Sheet

Must know:

  • 3 probe types: Liveness (restart), Readiness (remove from traffic), Startup (startup protection)
  • Startup Probe blocks liveness and readiness until first success — lifesaver for JVM
  • Liveness WITHOUT external dependencies, Readiness CAN check dependencies (with Circuit Breaker)
  • Death Spiral — aggressive liveness settings → mass Pod kill → outage
  • For Java: timeout > max GC pause, startupProbe for JIT warmup (failureThreshold: 30)
  • Separate endpoints: /health/live, /health/ready, /health/startup
  • Without health checks K8s cannot automatically recover the application

Common follow-up questions:

  • “Why all 3 probes?” — Startup gives time to start, Liveness catches hangs, Readiness — traffic readiness
  • “Liveness for stateful applications?” — No, restart may worsen the problem; use readiness + monitoring
  • “Does kubelet restart reset probe state?” — Yes, loses consecutive failure counter for startupProbe
  • “Exec vs HTTP probe?” — HTTP is faster, exec checks deeper; exec creates additional processes

Red flags (DO NOT say):

  • “All probes on one endpoint” (can’t distinguish “frozen” from “not ready”)
  • “Liveness checks DB, cache, external APIs” (external issues → mass kill)
  • “Health checks not needed in dev” (config bugs will surface only in prod)
  • “Startup Probe = Liveness with large delay” (no, startup blocks liveness/readiness)

Related topics:

  • [[What is liveness probe]] — liveness in detail
  • [[What is readiness probe]] — readiness in detail
  • [[How to organize rolling update in Kubernetes]] — health checks during deployment