Junior Level

Simple Explanation

Liveness Probe is a check that Kubernetes performs to make sure the application inside a container is running normally. If the check fails — K8s restarts the container.

Liveness probe – K8s periodically (every N seconds) checks if the container is alive. If probe fails – the container is restarted.

Simple Analogy

Liveness Probe is like an alarm clock that checks if you’re awake. If you don’t respond — someone comes and wakes you up again (restarts).

Why is it Needed?

An application can “freeze” — the process is running but not responding to requests. Without Liveness Probe, K8s will consider the Pod healthy even though it is useless.

Types of Checks

1. HTTP GET — K8s hits a URL, expects 200 OK:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

2. TCP Socket — checks if the port is open:

livenessProbe:
  tcpSocket:
    port: 8080

3. Command (exec) — runs a command inside the container:

livenessProbe:
  exec:
    command:
    - cat
    - /tmp/healthy

Key Parameters

Parameter	What it does
`initialDelaySeconds`	How long to wait before first check
`periodSeconds`	How often to check
`failureThreshold`	How many consecutive failures before restart
`timeoutSeconds`	How long to wait for a response

Example for a Java Application

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  initialDelaySeconds: 60    # Java takes long to start
  periodSeconds: 10
  failureThreshold: 3
  timeoutSeconds: 5

What a Junior Developer Should Remember

Liveness Probe checks if the application is alive
On failure — container is restarted
Do not check external dependencies (DB) in liveness
For Java, use a large initialDelaySeconds
Spring Boot Actuator: /actuator/health/liveness

Middle Level

When is Liveness Probe Needed?

Deadlock detection — application is frozen due to thread lock
Resource leak — application has degraded and won’t recover
Internal errors — critical error after which the application doesn’t work

Types of Checks: When to Use What

Type	When to use
httpGet	Web applications with HTTP endpoint
tcpSocket	When no HTTP (databases, brokers)
exec	Specific checks inside container
grpc	gRPC services (K8s 1.24+)

Dangers of Misconfiguration

Death Spiral:

Application slows down under high load
Liveness Probe has a short timeout
K8s kills overloaded Pods
New Pods can’t respond in time and also get killed
System crashes completely

Solution:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 60
  periodSeconds: 15
  failureThreshold: 5      # More attempts
  timeoutSeconds: 10       # More time

// Aggressive timeouts: initialDelaySeconds=5, periodSeconds=3, failureThreshold=1 // Application may not have time to start → infinite restarts. // For Java applications: initialDelaySeconds=60-120, periodSeconds=10.

What NOT to Check in Liveness

Database — if the DB is down, liveness will kill all Pods (even though the app itself is fine)
External APIs — temporary issues should not kill Pods
Cache — cache may be temporarily unavailable

Liveness should only check the internal state of the process.

Spring Boot Actuator

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  initialDelaySeconds: 60

Spring Boot automatically determines liveness/readiness states.

What a Middle Developer Should Remember

Liveness Probe — for deadlocks, not for dependencies
Death Spiral — real danger with aggressive settings
Check only internal state, not external services
Spring Boot: /actuator/health/liveness — the standard
failureThreshold and timeoutSeconds — buffer against false positives

Senior Level

Liveness Probe as a Self-Healing Mechanism

Liveness Probe is not just a “ping” — it is an application lifecycle management tool that can either save the system or kill it.

Architecture: How K8s Executes Probe

kubelet (on Node):
  every periodSeconds:
Wait initialDelaySeconds after startup
Execute probe (HTTP/TCP/exec/gRPC)
If success → reset failure count
If fail → increment failure count
If failures >= failureThreshold → kill container
Container restart by restartPolicy

Death Spiral: Detailed Analysis

Timeline:
T+0:  Load increased 10x
T+5:  Pods slowed down, response time > timeout
T+10: Liveness probe timeout → failure 1
T+20: Liveness probe timeout → failure 2
T+30: Liveness probe timeout → failure 3 → KILL
T+31: K8s restarts Pod
T+32: Pod not warmed up yet (JIT warmup)
T+42: Liveness probe fails again → KILL
... infinite loop

Prevention:

Startup Probe for warmup:

startupProbe:
  httpGet:
 path: /actuator/health/liveness
 port: 8080
  periodSeconds: 10
  failureThreshold: 30   # 5 minutes to start

Conservative thresholds:

livenessProbe:
  initialDelaySeconds: 120
  periodSeconds: 20
  failureThreshold: 6     # 2 minutes before killing
  timeoutSeconds: 15

Separate liveness from readiness: ```yaml
Liveness: only internal state

livenessProbe: httpGet: path: /actuator/health/liveness port: 8080

Readiness: dependencies

readinessProbe: httpGet: path: /actuator/health/readiness port: 8080

### Java Specifics

**JVM Warmup:**
- JIT compilation happens "on the fly"
- First requests are slower
- Liveness Probe may kill Pod before warmup

**Solution:**
- Startup Probe with large timeout
- initialDelaySeconds >= warmup time (2-3 min for Spring Boot)

**G1 GC and Stop-the-World:**
- During full GC, JVM doesn't respond
- Liveness Probe may interpret as death
- timeoutSeconds should be > max GC pause

**Thread Deadlock Detection:**
```java
// Custom health endpoint
@GetMapping("/health/liveness")
public ResponseEntity<String> liveness() {
    ThreadMXBean bean = ManagementFactory.getThreadMXBean();
    long[] deadlockedThreads = bean.findDeadlockedThreads();
    if (deadlockedThreads != null) {
        return ResponseEntity.status(500).body("Deadlock detected");
    }
    return ResponseEntity.ok("OK");
}

When NOT to Use Liveness Probe

Stateful applications with data corruption risk — restart may make it worse
Applications with long graceful shutdown — K8s waits for terminationGracePeriodSeconds
If restart is more expensive than downtime — sometimes manual intervention is better

Liveness probe does NOT tell whether the Pod is ready to accept traffic. It only tells whether the process is alive. For readiness – use readiness probe.

gRPC Health Checking (K8s 1.24+)

livenessProbe:
  grpc:
    port: 9090
    service: "grpc.health.v1.Health"

Standard protocol for gRPC services.

Anti-patterns

BAD:

# Checking DB in liveness
livenessProbe:
  httpGet:
    path: /health?check=db,cache,api   # X

GOOD:

# Liveness: only process
livenessProbe:
  httpGet:
    path: /actuator/health/liveness

# Readiness: dependencies
readinessProbe:
  httpGet:
    path: /actuator/health/readiness

Summary for Senior

Liveness Probe — a tool for fighting deadlocks, not for dependencies.
Death Spiral — real threat with aggressive settings.
Startup Probe for Java: protects during warmup.
Timeout > max GC pause, otherwise full GC = false positive kill.
Separate liveness (internal state) and readiness (dependencies).
Custom health endpoint for deadlock detection in Java.
Liveness Probe can kill the system — configure with caution.

Interview Cheat Sheet

Must know:

Liveness Probe checks “is the application alive”; on failure — container restart
Types: HTTP GET, TCP Socket, exec Command, gRPC (K8s 1.24+)
Death Spiral — real threat with aggressive settings (short timeout → kill → warmup → kill)
Liveness must NOT check external dependencies (DB, API) — only internal state
For Java: startupProbe protects during JIT warmup, timeout > max GC pause
Spring Boot Actuator: /actuator/health/liveness — standard endpoint
Separate liveness (process alive) and readiness (ready for traffic)

Common follow-up questions:

“Why shouldn’t you check DB in liveness?” — If DB is down, liveness kills all Pods; DB won’t recover
“Death Spiral — what is it?” — Aggressive timeouts → kill → warmup → kill again → infinite loop
“Why startupProbe for Java?” — JVM takes long to start; without startupProbe you need a huge initialDelaySeconds
“G1 GC and liveness?” — Full GC causes stop-the-world; timeout must be > max GC pause

Red flags (DO NOT say):

“Liveness checks DB and cache” (external dependencies → mass Pod killing)
initialDelaySeconds=5 for Java application (won’t have time to start)
“Liveness = readiness” (different goals: liveness → restart, readiness → remove from traffic)
“Liveness is not needed — K8s will restart anyway” (K8s only restarts a crashed container, not a frozen one)

Related topics:

[[What is readiness probe]] — readiness check for traffic
[[Why are health checks needed]] — all three probes together
[[How to organize rolling update in Kubernetes]] — health checks during deployment