What is Liveness Probe?
Liveness probe -- K8s periodically (every N seconds) checks if the container is alive. If probe fails -- the container is restarted.
Junior Level
Simple Explanation
Liveness Probe is a check that Kubernetes performs to make sure the application inside a container is running normally. If the check fails — K8s restarts the container.
Liveness probe – K8s periodically (every N seconds) checks if the container is alive. If probe fails – the container is restarted.
Simple Analogy
Liveness Probe is like an alarm clock that checks if you’re awake. If you don’t respond — someone comes and wakes you up again (restarts).
Why is it Needed?
An application can “freeze” — the process is running but not responding to requests. Without Liveness Probe, K8s will consider the Pod healthy even though it is useless.
Types of Checks
1. HTTP GET — K8s hits a URL, expects 200 OK:
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
2. TCP Socket — checks if the port is open:
livenessProbe:
tcpSocket:
port: 8080
3. Command (exec) — runs a command inside the container:
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
Key Parameters
| Parameter | What it does |
|---|---|
initialDelaySeconds |
How long to wait before first check |
periodSeconds |
How often to check |
failureThreshold |
How many consecutive failures before restart |
timeoutSeconds |
How long to wait for a response |
Example for a Java Application
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60 # Java takes long to start
periodSeconds: 10
failureThreshold: 3
timeoutSeconds: 5
What a Junior Developer Should Remember
- Liveness Probe checks if the application is alive
- On failure — container is restarted
- Do not check external dependencies (DB) in liveness
- For Java, use a large initialDelaySeconds
- Spring Boot Actuator:
/actuator/health/liveness
Middle Level
When is Liveness Probe Needed?
- Deadlock detection — application is frozen due to thread lock
- Resource leak — application has degraded and won’t recover
- Internal errors — critical error after which the application doesn’t work
Types of Checks: When to Use What
| Type | When to use |
|---|---|
| httpGet | Web applications with HTTP endpoint |
| tcpSocket | When no HTTP (databases, brokers) |
| exec | Specific checks inside container |
| grpc | gRPC services (K8s 1.24+) |
Dangers of Misconfiguration
Death Spiral:
- Application slows down under high load
- Liveness Probe has a short timeout
- K8s kills overloaded Pods
- New Pods can’t respond in time and also get killed
- System crashes completely
Solution:
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60
periodSeconds: 15
failureThreshold: 5 # More attempts
timeoutSeconds: 10 # More time
// Aggressive timeouts: initialDelaySeconds=5, periodSeconds=3, failureThreshold=1 // Application may not have time to start → infinite restarts. // For Java applications: initialDelaySeconds=60-120, periodSeconds=10.
What NOT to Check in Liveness
- Database — if the DB is down, liveness will kill all Pods (even though the app itself is fine)
- External APIs — temporary issues should not kill Pods
- Cache — cache may be temporarily unavailable
Liveness should only check the internal state of the process.
Spring Boot Actuator
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
Spring Boot automatically determines liveness/readiness states.
What a Middle Developer Should Remember
- Liveness Probe — for deadlocks, not for dependencies
- Death Spiral — real danger with aggressive settings
- Check only internal state, not external services
- Spring Boot:
/actuator/health/liveness— the standard - failureThreshold and timeoutSeconds — buffer against false positives
Senior Level
Liveness Probe as a Self-Healing Mechanism
Liveness Probe is not just a “ping” — it is an application lifecycle management tool that can either save the system or kill it.
Architecture: How K8s Executes Probe
kubelet (on Node):
every periodSeconds:
1. Wait initialDelaySeconds after startup
2. Execute probe (HTTP/TCP/exec/gRPC)
3. If success → reset failure count
4. If fail → increment failure count
5. If failures >= failureThreshold → kill container
6. Container restart by restartPolicy
Death Spiral: Detailed Analysis
Timeline:
T+0: Load increased 10x
T+5: Pods slowed down, response time > timeout
T+10: Liveness probe timeout → failure 1
T+20: Liveness probe timeout → failure 2
T+30: Liveness probe timeout → failure 3 → KILL
T+31: K8s restarts Pod
T+32: Pod not warmed up yet (JIT warmup)
T+42: Liveness probe fails again → KILL
... infinite loop
Prevention:
- Startup Probe for warmup:
startupProbe: httpGet: path: /actuator/health/liveness port: 8080 periodSeconds: 10 failureThreshold: 30 # 5 minutes to start - Conservative thresholds:
livenessProbe: initialDelaySeconds: 120 periodSeconds: 20 failureThreshold: 6 # 2 minutes before killing timeoutSeconds: 15 - Separate liveness from readiness:
```yaml
Liveness: only internal state
livenessProbe: httpGet: path: /actuator/health/liveness port: 8080
Readiness: dependencies
readinessProbe: httpGet: path: /actuator/health/readiness port: 8080
### Java Specifics
**JVM Warmup:**
- JIT compilation happens "on the fly"
- First requests are slower
- Liveness Probe may kill Pod before warmup
**Solution:**
- Startup Probe with large timeout
- initialDelaySeconds >= warmup time (2-3 min for Spring Boot)
**G1 GC and Stop-the-World:**
- During full GC, JVM doesn't respond
- Liveness Probe may interpret as death
- timeoutSeconds should be > max GC pause
**Thread Deadlock Detection:**
```java
// Custom health endpoint
@GetMapping("/health/liveness")
public ResponseEntity<String> liveness() {
ThreadMXBean bean = ManagementFactory.getThreadMXBean();
long[] deadlockedThreads = bean.findDeadlockedThreads();
if (deadlockedThreads != null) {
return ResponseEntity.status(500).body("Deadlock detected");
}
return ResponseEntity.ok("OK");
}
When NOT to Use Liveness Probe
- Stateful applications with data corruption risk — restart may make it worse
- Applications with long graceful shutdown — K8s waits for terminationGracePeriodSeconds
- If restart is more expensive than downtime — sometimes manual intervention is better
Liveness probe does NOT tell whether the Pod is ready to accept traffic. It only tells whether the process is alive. For readiness – use readiness probe.
gRPC Health Checking (K8s 1.24+)
livenessProbe:
grpc:
port: 9090
service: "grpc.health.v1.Health"
Standard protocol for gRPC services.
Anti-patterns
BAD:
# Checking DB in liveness
livenessProbe:
httpGet:
path: /health?check=db,cache,api # X
GOOD:
# Liveness: only process
livenessProbe:
httpGet:
path: /actuator/health/liveness
# Readiness: dependencies
readinessProbe:
httpGet:
path: /actuator/health/readiness
Summary for Senior
- Liveness Probe — a tool for fighting deadlocks, not for dependencies.
- Death Spiral — real threat with aggressive settings.
- Startup Probe for Java: protects during warmup.
- Timeout > max GC pause, otherwise full GC = false positive kill.
- Separate liveness (internal state) and readiness (dependencies).
- Custom health endpoint for deadlock detection in Java.
- Liveness Probe can kill the system — configure with caution.
Interview Cheat Sheet
Must know:
- Liveness Probe checks “is the application alive”; on failure — container restart
- Types: HTTP GET, TCP Socket, exec Command, gRPC (K8s 1.24+)
- Death Spiral — real threat with aggressive settings (short timeout → kill → warmup → kill)
- Liveness must NOT check external dependencies (DB, API) — only internal state
- For Java: startupProbe protects during JIT warmup, timeout > max GC pause
- Spring Boot Actuator:
/actuator/health/liveness— standard endpoint - Separate liveness (process alive) and readiness (ready for traffic)
Common follow-up questions:
- “Why shouldn’t you check DB in liveness?” — If DB is down, liveness kills all Pods; DB won’t recover
- “Death Spiral — what is it?” — Aggressive timeouts → kill → warmup → kill again → infinite loop
- “Why startupProbe for Java?” — JVM takes long to start; without startupProbe you need a huge initialDelaySeconds
- “G1 GC and liveness?” — Full GC causes stop-the-world; timeout must be > max GC pause
Red flags (DO NOT say):
- “Liveness checks DB and cache” (external dependencies → mass Pod killing)
initialDelaySeconds=5for Java application (won’t have time to start)- “Liveness = readiness” (different goals: liveness → restart, readiness → remove from traffic)
- “Liveness is not needed — K8s will restart anyway” (K8s only restarts a crashed container, not a frozen one)
Related topics:
- [[What is readiness probe]] — readiness check for traffic
- [[Why are health checks needed]] — all three probes together
- [[How to organize rolling update in Kubernetes]] — health checks during deployment