Question 17 · Section 14

What is Liveness Probe?

Liveness probe -- K8s periodically (every N seconds) checks if the container is alive. If probe fails -- the container is restarted.

Language versions: English Russian Ukrainian

Junior Level

Simple Explanation

Liveness Probe is a check that Kubernetes performs to make sure the application inside a container is running normally. If the check fails — K8s restarts the container.

Liveness probe – K8s periodically (every N seconds) checks if the container is alive. If probe fails – the container is restarted.

Simple Analogy

Liveness Probe is like an alarm clock that checks if you’re awake. If you don’t respond — someone comes and wakes you up again (restarts).

Why is it Needed?

An application can “freeze” — the process is running but not responding to requests. Without Liveness Probe, K8s will consider the Pod healthy even though it is useless.

Types of Checks

1. HTTP GET — K8s hits a URL, expects 200 OK:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

2. TCP Socket — checks if the port is open:

livenessProbe:
  tcpSocket:
    port: 8080

3. Command (exec) — runs a command inside the container:

livenessProbe:
  exec:
    command:
    - cat
    - /tmp/healthy

Key Parameters

Parameter What it does
initialDelaySeconds How long to wait before first check
periodSeconds How often to check
failureThreshold How many consecutive failures before restart
timeoutSeconds How long to wait for a response

Example for a Java Application

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  initialDelaySeconds: 60    # Java takes long to start
  periodSeconds: 10
  failureThreshold: 3
  timeoutSeconds: 5

What a Junior Developer Should Remember

  • Liveness Probe checks if the application is alive
  • On failure — container is restarted
  • Do not check external dependencies (DB) in liveness
  • For Java, use a large initialDelaySeconds
  • Spring Boot Actuator: /actuator/health/liveness

Middle Level

When is Liveness Probe Needed?

  1. Deadlock detection — application is frozen due to thread lock
  2. Resource leak — application has degraded and won’t recover
  3. Internal errors — critical error after which the application doesn’t work

Types of Checks: When to Use What

Type When to use
httpGet Web applications with HTTP endpoint
tcpSocket When no HTTP (databases, brokers)
exec Specific checks inside container
grpc gRPC services (K8s 1.24+)

Dangers of Misconfiguration

Death Spiral:

  • Application slows down under high load
  • Liveness Probe has a short timeout
  • K8s kills overloaded Pods
  • New Pods can’t respond in time and also get killed
  • System crashes completely

Solution:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 60
  periodSeconds: 15
  failureThreshold: 5      # More attempts
  timeoutSeconds: 10       # More time

// Aggressive timeouts: initialDelaySeconds=5, periodSeconds=3, failureThreshold=1 // Application may not have time to start → infinite restarts. // For Java applications: initialDelaySeconds=60-120, periodSeconds=10.

What NOT to Check in Liveness

  • Database — if the DB is down, liveness will kill all Pods (even though the app itself is fine)
  • External APIs — temporary issues should not kill Pods
  • Cache — cache may be temporarily unavailable

Liveness should only check the internal state of the process.

Spring Boot Actuator

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  initialDelaySeconds: 60

Spring Boot automatically determines liveness/readiness states.

What a Middle Developer Should Remember

  • Liveness Probe — for deadlocks, not for dependencies
  • Death Spiral — real danger with aggressive settings
  • Check only internal state, not external services
  • Spring Boot: /actuator/health/liveness — the standard
  • failureThreshold and timeoutSeconds — buffer against false positives

Senior Level

Liveness Probe as a Self-Healing Mechanism

Liveness Probe is not just a “ping” — it is an application lifecycle management tool that can either save the system or kill it.

Architecture: How K8s Executes Probe

kubelet (on Node):
  every periodSeconds:
    1. Wait initialDelaySeconds after startup
    2. Execute probe (HTTP/TCP/exec/gRPC)
    3. If success → reset failure count
    4. If fail → increment failure count
    5. If failures >= failureThreshold → kill container
    6. Container restart by restartPolicy

Death Spiral: Detailed Analysis

Timeline:
T+0:  Load increased 10x
T+5:  Pods slowed down, response time > timeout
T+10: Liveness probe timeout → failure 1
T+20: Liveness probe timeout → failure 2
T+30: Liveness probe timeout → failure 3 → KILL
T+31: K8s restarts Pod
T+32: Pod not warmed up yet (JIT warmup)
T+42: Liveness probe fails again → KILL
... infinite loop

Prevention:

  1. Startup Probe for warmup:
    startupProbe:
      httpGet:
     path: /actuator/health/liveness
     port: 8080
      periodSeconds: 10
      failureThreshold: 30   # 5 minutes to start
    
  2. Conservative thresholds:
    livenessProbe:
      initialDelaySeconds: 120
      periodSeconds: 20
      failureThreshold: 6     # 2 minutes before killing
      timeoutSeconds: 15
    
  3. Separate liveness from readiness: ```yaml

    Liveness: only internal state

    livenessProbe: httpGet: path: /actuator/health/liveness port: 8080

Readiness: dependencies

readinessProbe: httpGet: path: /actuator/health/readiness port: 8080


### Java Specifics

**JVM Warmup:**
- JIT compilation happens "on the fly"
- First requests are slower
- Liveness Probe may kill Pod before warmup

**Solution:**
- Startup Probe with large timeout
- initialDelaySeconds >= warmup time (2-3 min for Spring Boot)

**G1 GC and Stop-the-World:**
- During full GC, JVM doesn't respond
- Liveness Probe may interpret as death
- timeoutSeconds should be > max GC pause

**Thread Deadlock Detection:**
```java
// Custom health endpoint
@GetMapping("/health/liveness")
public ResponseEntity<String> liveness() {
    ThreadMXBean bean = ManagementFactory.getThreadMXBean();
    long[] deadlockedThreads = bean.findDeadlockedThreads();
    if (deadlockedThreads != null) {
        return ResponseEntity.status(500).body("Deadlock detected");
    }
    return ResponseEntity.ok("OK");
}

When NOT to Use Liveness Probe

  1. Stateful applications with data corruption risk — restart may make it worse
  2. Applications with long graceful shutdown — K8s waits for terminationGracePeriodSeconds
  3. If restart is more expensive than downtime — sometimes manual intervention is better

Liveness probe does NOT tell whether the Pod is ready to accept traffic. It only tells whether the process is alive. For readiness – use readiness probe.

gRPC Health Checking (K8s 1.24+)

livenessProbe:
  grpc:
    port: 9090
    service: "grpc.health.v1.Health"

Standard protocol for gRPC services.

Anti-patterns

BAD:

# Checking DB in liveness
livenessProbe:
  httpGet:
    path: /health?check=db,cache,api   # X

GOOD:

# Liveness: only process
livenessProbe:
  httpGet:
    path: /actuator/health/liveness

# Readiness: dependencies
readinessProbe:
  httpGet:
    path: /actuator/health/readiness

Summary for Senior

  • Liveness Probe — a tool for fighting deadlocks, not for dependencies.
  • Death Spiral — real threat with aggressive settings.
  • Startup Probe for Java: protects during warmup.
  • Timeout > max GC pause, otherwise full GC = false positive kill.
  • Separate liveness (internal state) and readiness (dependencies).
  • Custom health endpoint for deadlock detection in Java.
  • Liveness Probe can kill the system — configure with caution.

Interview Cheat Sheet

Must know:

  • Liveness Probe checks “is the application alive”; on failure — container restart
  • Types: HTTP GET, TCP Socket, exec Command, gRPC (K8s 1.24+)
  • Death Spiral — real threat with aggressive settings (short timeout → kill → warmup → kill)
  • Liveness must NOT check external dependencies (DB, API) — only internal state
  • For Java: startupProbe protects during JIT warmup, timeout > max GC pause
  • Spring Boot Actuator: /actuator/health/liveness — standard endpoint
  • Separate liveness (process alive) and readiness (ready for traffic)

Common follow-up questions:

  • “Why shouldn’t you check DB in liveness?” — If DB is down, liveness kills all Pods; DB won’t recover
  • “Death Spiral — what is it?” — Aggressive timeouts → kill → warmup → kill again → infinite loop
  • “Why startupProbe for Java?” — JVM takes long to start; without startupProbe you need a huge initialDelaySeconds
  • “G1 GC and liveness?” — Full GC causes stop-the-world; timeout must be > max GC pause

Red flags (DO NOT say):

  • “Liveness checks DB and cache” (external dependencies → mass Pod killing)
  • initialDelaySeconds=5 for Java application (won’t have time to start)
  • “Liveness = readiness” (different goals: liveness → restart, readiness → remove from traffic)
  • “Liveness is not needed — K8s will restart anyway” (K8s only restarts a crashed container, not a frozen one)

Related topics:

  • [[What is readiness probe]] — readiness check for traffic
  • [[Why are health checks needed]] — all three probes together
  • [[How to organize rolling update in Kubernetes]] — health checks during deployment