Why are Health Checks Needed in Kubernetes?
Health checks in K8s -- a combination of liveness, readiness and startup probes. Each solves its own task: liveness = is it alive, readiness = is it ready, startup = has it star...
Junior Level
Simple Definition
Health Checks are Kubernetes mechanisms that allow the orchestrator to understand the state of an application: is it running, is it ready to accept traffic, and is it stuck. Based on these checks, Kubernetes automatically makes decisions: restart a container, remove it from load balancing, or give it more time to start.
Health checks in K8s – a combination of liveness, readiness and startup probes. Each solves its own task: liveness = is it alive, readiness = is it ready, startup = has it started.
Analogy
Imagine you manage a coffee shop chain. Health Checks are daily reports from each shop:
- Liveness: “We’re open, lights are on” (container is alive)
- Readiness: “Registers work, barista is in place, we can serve customers” (ready for traffic)
- Startup: “We’re still renovating, opening soon” (give us time to start)
YAML Example (all three probes)
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- name: app
image: my-app:1.0
startupProbe:
httpGet:
path: /health/startup
port: 8080
failureThreshold: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /health/liveness
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/readiness
port: 8080
periodSeconds: 5
failureThreshold: 3
kubectl Example
# Check Pod status
kubectl get pods
# READY: 1/1 — all probes passed, 0/1 — readiness failed
# Detailed probe information
kubectl describe pod my-app
# Check events (restarts due to liveness)
kubectl get events --sort-by='.lastTimestamp'
When to Use
- Always in production — without health checks Kubernetes cannot automatically recover the application
- For any microservices, APIs, web applications
- For Java applications with long startup (JVM, Spring Boot) — startupProbe is mandatory
When NOT to use liveness probe
Do NOT use liveness probe for stateful applications (databases) – restart may worsen the problem. Use readiness probe + monitoring instead.
Middle Level
How it Works
The Kubernetes kubelet performs three types of checks independently of each other:
-
Liveness Probe — kubelet polls the endpoint. On failure (failureThreshold consecutive times), the kubelet kills the container and the container runtime restarts it. Restart count increases.
-
Readiness Probe — kubelet polls the endpoint. On failure, the Pod is removed from Endpoints of all related Services. The container continues running.
-
Startup Probe — kubelet polls until first success. While startupProbe hasn’t succeeded, liveness and readiness are disabled. After the first success, startupProbe is no longer executed.
Each probe runs in a separate kubelet goroutine with a timeout. If the probe doesn’t respond within timeoutSeconds, it is considered failed.
Practical Scenarios
Scenario 1: Java application with long startup (3 minutes)
startupProbe:
httpGet:
path: /actuator/health
port: 8080
failureThreshold: 30 # 30 attempts
periodSeconds: 10 # every 10 seconds = 5 minutes max
# Only after startupProbe success:
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
periodSeconds: 10 # check every 10 seconds
Without startupProbe, you’d have to set initialDelaySeconds: 180 for liveness, meaning 3 minutes with no hang detection after startup.
Scenario 2: Temporary overload The application receives too many requests, queue overflows. Readiness Probe returns 503, Pod is removed from load balancing, processes accumulated tasks, and returns to service.
Scenario 3: Deadlock detection Liveness Probe checks not only HTTP but also internal state. When a deadlock is detected — the probe fails, the container is restarted.
Common Mistakes Table
| Mistake | Consequence | Solution |
|---|---|---|
| All three probes on the same endpoint | Cannot distinguish “frozen” from “not ready” | Separate /health/liveness, /health/readiness, /health/startup |
| Liveness checks external DB | When DB fails, all Pods restart endlessly | Liveness checks only internal state, no external dependencies |
| Too aggressive failureThreshold | Unnecessary restarts on short-term issues | Set 3-5 for liveness, 15-30 for startup |
| Missing health checks | Frozen Pods receive traffic, users see 502 | Always add all three probes in production |
| timeoutSeconds too small (1 sec) | Failures on GC pause or slow requests | Set 3-10 seconds, considering worst-case app latency |
Using :latest tag |
Kubernetes doesn’t see changes and doesn’t start Rolling Update | Always use specific tags or SHA256 digest |
Comparison of All Three Probes
| Characteristic | Startup Probe | Liveness Probe | Readiness Probe |
|---|---|---|---|
| Question | “Have I started?” | “Am I alive?” | “Am I ready for traffic?” |
| Action on failure | Retry (up to failureThreshold) | Container restart | Removal from Service Endpoints |
| When disabled | After first success — forever | Always runs (if no startupProbe) | Always runs (if no startupProbe) |
| Interaction | Blocks liveness and readiness | Restarts | Removes from load balancing |
| Typical periodSeconds | 5-10 | 10-30 | 3-10 |
| Typical failureThreshold | 15-30 | 3-5 | 3 |
| Checks dependencies? | No | No | Yes |
When NOT to Use
- Short-lived Jobs/CronJobs — if the Pod starts, performs a task, and exits, health checks are not needed (or only for debugging)
- Sidecar containers without network interface — if the container only writes logs or collects metrics, liveness may be enough, readiness — not
- Init containers — they run before main containers and exit, health checks don’t apply to them
Senior Level
Deep Mechanics: kubelet probe manager
Health Checks are managed by the kubelet, specifically the prober subsystem in the kubelet source code (pkg/kubelet/prober/).
Architecture:
- worker — goroutine per probe, performs HTTP/TCP/exec checks
- manager — coordinates workers, stores results in
results.Manager - statusManager — passes results to PodStatus, which is sent to API Server
The kubelet executes probes asynchronously. Each worker has its own queue (workqueue) and timeout. On timeout, the probe is considered failed even if there’s no response.
Startup Probe + Liveness/Readiness interaction:
Pod created → startupProbe active, liveness/readiness disabled
→ kubelet executes startupProbe every periodSeconds
→ On success: startupProbe disabled forever, liveness/readiness activated
→ On failure (failureThreshold consecutive): container killed (since K8s 1.20+)
Before K8s 1.20, failure in startupProbe didn’t kill the container — this was fixed in PR #95190.
Trade-offs
| Aspect | Trade-off |
|---|---|
| Startup vs initialDelaySeconds | startupProbe is more flexible: gives plenty of startup time, then fast liveness. initialDelaySeconds — either too long or too aggressive |
| Liveness frequency | Frequent checks = faster deadlock detection, but higher CPU overhead and false positives on GC pause |
| HTTP vs exec | HTTP is faster and gives status code, but exec can check deeper (files, processes). exec is slower and creates additional processes |
| Dependencies in Readiness | Checking dependencies = honest status, but risk of cascading failure. Without checking = Pod gets traffic but can’t process it |
| Single vs Multi endpoint | Single endpoint is simpler but less informative. Separate endpoints are more accurate but require more code and maintenance |
Edge Cases (6+)
Edge Case 1: JVM GC Pause and Liveness Probe
JVM can stop all threads for 5-30 seconds during Full GC (especially without ZGC/Shenandoah). If timeoutSeconds: 3 and failureThreshold: 3, three failed probes will kill the container. Solution: ZGC/G1GC with MaxGCPauseMillis, timeoutSeconds: 10, or a separate native sidecar for health checks outside the JVM.
Edge Case 2: Cascading Failure Through Readiness All microservices check the same DB in readinessProbe. DB slows down → all Pods simultaneously disconnect from traffic → complete outage. Solution: Readiness checks only internal state, and DB availability is handled via Circuit Breaker at the business logic level.
Edge Case 3: kubelet Restart Resets Probe State On kubelet restart, the consecutive failure counter for startupProbe is lost. A Pod close to failureThreshold gets a “clean slate”. This can cause a never-starting application to keep trying indefinitely.
Edge Case 4: Containerd/CRI-O Timeout
If the kubelet can’t execute an exec probe due to CRI issues (containerd/CRI-O hung), the probe is considered failed. This isn’t an application problem, but the container will still be restarted. Monitor kubelet_pod_worker_duration_seconds.
Edge Case 5: Readiness + HPA Race Condition
HPA checks CPU/RPS. Readiness removes the Pod from traffic. HPA sees RPS drop on the Pod and creates new replicas. But new ones may also fail Readiness. Result: Pod explosion without real benefit. Solution: behavior.stabilizationWindowSeconds in HPA + proper Readiness without external dependencies.
Edge Case 6: Liveness Probe + Graceful Shutdown Conflict Liveness Probe fails, kubelet kills the container. Simultaneously, the application receives SIGTERM and begins graceful shutdown (completes requests in 30 sec). But the kubelet has already started a new container. Two instances of the same application run in parallel, potential resource conflict (files, ports, DB connections).
Edge Case 7: Multi-container Pod with Different Probes Pod has 2 containers: app and sidecar (envoy). Liveness passes for app, fails for sidecar. kubelet restarts sidecar, but app keeps running. If sidecar is critical (e.g., service mesh proxy), app runs without proxy. Solution: use ShareProcessNamespace and check both containers.
Performance Numbers
| Metric | Value |
|---|---|
| kubelet probe overhead (HTTP) | ~1-5ms CPU, ~10-50KB RAM per probe |
| kubelet probe overhead (exec) | ~10-50ms CPU, new process creation |
| Probe timeout effect | On timeout probe is failed, but goroutine waits for timeout before cleanup |
| API Server PodStatus update latency | 10-100ms |
| EndpointSlice propagation delay | 500ms-2s |
| kubelet restart probe state loss | Complete loss of consecutive failure counter |
| JVM Full GC pause (G1GC) | 100ms-2s, (ZGC) 1-10ms |
| Max probes per Pod | Not limited, but each probe = goroutine, ~2KB stack |
Security
- Health endpoints must not be public — they may expose internal application structure
- exec probes run as the container user — if the container runs as root, exec probe is also root. Not an escalation, but worth noting
- Liveness Probe must not have side-effects — an attacker with Pod access can call the liveness endpoint and trigger a restart (DoS)
- mTLS for health endpoints — if sidecar proxy (Istio) intercepts all traffic, health endpoints should be excluded from mTLS or use a separate port
- NetworkPolicy — restrict access to health endpoints only from kubelet (via
kube-systemnamespace)
Production War Story
Situation: Fintech company, 500-node cluster, 5000 Pods. All services had a single health endpoint /health that checked DB, Redis, and Kafka. During an incident, Redis started slowing down (99th percentile latency increased from 5ms to 500ms). Liveness Probe started timing out because Redis checks took 10+ seconds with timeoutSeconds: 5.
Domino effect:
- Liveness Probe failed → kubelet killed 2000 Pods in 2 minutes
- New Pods started, but startupProbe checked Redis → also timed out
- HPA saw replica drop → created 3000 more Pods
- Cluster hit Pod-per-node limit → new Pods in Pending
- Complete outage for 45 minutes
Post-mortem and fix:
- Endpoints separated:
/health/live(no external dependencies),/health/ready(with Circuit Breaker) - startupProbe removed Redis check — only JVM startup
- Added Redis Circuit Breaker with fallback to degraded mode
- HPA limited with
maxReplicasandstabilizationWindowSeconds: 300 - Chaos Engineering introduced — regular failure scenario testing
Monitoring after fix:
# Alert: Liveness restarts increasing
rate(kube_pod_container_status_restarts_total[5m]) > 0.1
# Alert: Readiness failures
sum(kube_pod_status_ready{condition="false"}) by (namespace) > 10
# Alert: HPA scaling too aggressive
kube_horizontalpodautoscaler_status_desired_replicas - kube_horizontalpodautoscaler_spec_min_replicas > 5
Monitoring (Prometheus/Grafana)
Key metrics:
# Container restarts (liveness failures)
rate(kube_pod_container_status_restarts_total[5m])
# Pods not in Ready state
sum(kube_pod_status_ready{condition="false"}) by (namespace)
# kubelet probe latency (should be < periodSeconds)
histogram_quantile(0.99, kubelet_pod_worker_duration_seconds_bucket)
# Probe timeout events (via kubelet logs)
kubelet_pod_probe_failed_total
# HPA scaling events
kube_horizontalpodautoscaler_status_current_replicas
# EndpointSlice availability
sum(kube_endpoint_address_available) by (service)
Grafana Dashboard panels:
- Pod Restarts Rate (by namespace) — red line at > 0.1/sec
- Liveness vs Readiness failure rate — correlation
- Probe latency p50/p99 — should be stable
- HPA current vs desired replicas
- Correlate with infrastructure metrics: node CPU, memory, disk I/O
Highload Best Practices
- Always separate liveness and readiness endpoints — liveness = “process alive”, readiness = “can serve traffic”
- Use startupProbe for JVM applications — a lifesaver for Spring Boot with 2-3 minute startup
- Liveness without external dependencies — check only deadlock, thread starvation, critical memory shortage
- Readiness with Circuit Breaker — if dependency is unavailable, return Ready in degraded mode, don’t remove Pod
- timeoutSeconds >= p99 latency of your endpoint — account for worst-case with GC pause
- Do not use
:latest— Kubernetes won’t detect the change and won’t start Rolling Update - Graceful Shutdown + preStop hook — give time to complete requests:
lifecycle: preStop: exec: command: ["sh", "-c", "sleep 10"] terminationGracePeriodSeconds: 60 - Monitor restart rate — more than 1 restart per minute per Pod = problem with liveness config or application
- Use Prometheus Operator + kube-state-metrics — for automatic health check metrics collection
- Chaos Engineering — regularly test scenarios: DB failure, network partition, node drain — and check how health checks handle them
Interview Cheat Sheet
Must know:
- 3 probe types: Liveness (restart), Readiness (remove from traffic), Startup (startup protection)
- Startup Probe blocks liveness and readiness until first success — lifesaver for JVM
- Liveness WITHOUT external dependencies, Readiness CAN check dependencies (with Circuit Breaker)
- Death Spiral — aggressive liveness settings → mass Pod kill → outage
- For Java: timeout > max GC pause, startupProbe for JIT warmup (failureThreshold: 30)
- Separate endpoints:
/health/live,/health/ready,/health/startup - Without health checks K8s cannot automatically recover the application
Common follow-up questions:
- “Why all 3 probes?” — Startup gives time to start, Liveness catches hangs, Readiness — traffic readiness
- “Liveness for stateful applications?” — No, restart may worsen the problem; use readiness + monitoring
- “Does kubelet restart reset probe state?” — Yes, loses consecutive failure counter for startupProbe
- “Exec vs HTTP probe?” — HTTP is faster, exec checks deeper; exec creates additional processes
Red flags (DO NOT say):
- “All probes on one endpoint” (can’t distinguish “frozen” from “not ready”)
- “Liveness checks DB, cache, external APIs” (external issues → mass kill)
- “Health checks not needed in dev” (config bugs will surface only in prod)
- “Startup Probe = Liveness with large delay” (no, startup blocks liveness/readiness)
Related topics:
- [[What is liveness probe]] — liveness in detail
- [[What is readiness probe]] — readiness in detail
- [[How to organize rolling update in Kubernetes]] — health checks during deployment