Question 5 · Section 17

What is Circuit Breaker pattern

Cascade failure (chain failure): service A waits for service B, service C waits for A — all threads hang, and the entire system crashes.

Language versions: English Russian Ukrainian

Junior Level

Circuit Breaker — a pattern that protects a service from calling a non-functioning dependent service.

Real-life analogy: an electrical circuit breaker cuts off current during overload to prevent wiring from burning out.

Three states:

  1. CLOSED — normal mode, calls go through
  2. OPEN — calls are blocked (service is “broken”)
  3. HALF-OPEN — probe call (check if the service is fixed)
Service A -> Service B (working)
              ↓ (starts lagging)
Circuit Breaker -> OPEN (blocks calls)
              ↓ (waited, check)
Circuit Breaker -> HALF-OPEN -> one call to check
              ↓ (success!)
Circuit Breaker -> CLOSED (working again)

Middle Level

How it works

// Resilience4j
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
    .failureRateThreshold(50)  // 50% — compromise: don't open from a single random error.
    // slidingWindowSize=10, minimumNumberOfCalls=5: need at least 5 calls,
    // and if 50%+ of them FAILED — the breaker opens.
    .waitDurationInOpenState(Duration.ofSeconds(10))  // wait 10 sec
    .slidingWindowSize(10)  // last 10 calls
    .build();

CircuitBreaker circuitBreaker = CircuitBreaker.of("backend", config);

// Usage
Supplier<String> decorated = CircuitBreaker
    .decorateSupplier(circuitBreaker, () -> backendService.call());

Try.of(decorated)
    .onSuccess(result -> process(result))
    .onFailure(error -> fallback());

When to use

Circuit Breaker is needed when:

  • An external service may be unavailable
  • Network problems (timeout, connection refused)
  • Need to protect your service from cascade failure

Cascade failure (chain failure): service A waits for service B, service C waits for A — all threads hang, and the entire system crashes.

Circuit Breaker is not needed when:

  • The call is always fast and reliable
  • Internal calls within the same process

Common mistakes

  1. Too aggressive threshold:
    failureRateThreshold = 10% -> OPEN after 1 error out of 10
    Too many false positives
    

Senior Level

Internal Implementation

State transitions:

CLOSED -> OPEN: failure rate > threshold
OPEN -> HALF-OPEN: wait duration elapsed
HALF-OPEN -> CLOSED: success call
HALF-OPEN -> OPEN: failure call

Sliding window:

COUNT-based: last N calls
TIME-based: calls in the last T seconds

Resilience4j supports both types

Architectural Trade-offs

Approach Pros Cons
Circuit Breaker Fast fallback because it doesn’t waste time waiting for timeout from a broken service — returns a stub immediately. Complex configuration
Retry Tries again Can worsen the problem
Timeout Protects from hanging Does not protect from errors

Production Experience

Spring Boot + Resilience4j:

@CircuitBreaker(name = "backend", fallbackMethod = "fallback")
@Retry(name = "backend")
@TimeLimiter(name = "backend")
public CompletableFuture<String> callBackend(String input) {
    return backendService.callAsync(input);
}

public CompletableFuture<String> fallback(String input, RuntimeException ex) {
    return CompletableFuture.completedFuture("fallback: " + input);
}

Configuration:

resilience4j:
  circuitbreaker:
    instances:
      backend:
        slidingWindowSize: 10
        failureRateThreshold: 50
        waitDurationInOpenState: 10s
        permittedNumberOfCallsInHalfOpenState: 3
        minimumNumberOfCalls: 5

Best Practices

✅ Configure threshold based on metrics
✅ Use fallback for graceful degradation
✅ Monitor Circuit Breaker state
✅ Combine with Retry and Timeout

❌ Don't use too low a threshold
❌ Don't ignore HALF-OPEN state
❌ Don't use without fallback

Interview Cheat Sheet

Must know:

  • Circuit Breaker protects from cascade failure by blocking calls to a non-functioning service
  • Three states: CLOSED (normal), OPEN (blocking), HALF-OPEN (probe)
  • Transition CLOSED->OPEN when failure rate exceeds threshold (usually 50%)
  • OPEN->HALF-OPEN after wait duration (10-60 sec), HALF-OPEN->CLOSED on success
  • Sliding window (10-100 calls) for a statistically significant sample
  • Always combine with fallback for graceful degradation
  • Resilience4j — standard library for Java/Spring

Common follow-up questions:

  • Why 50% threshold? Compromise: don’t open from a single random error, but don’t wait until everything crashes.
  • Why HALF-OPEN? To check if the service is fixed before restoring full traffic.
  • How is Circuit Breaker better than a simple timeout? Timeout protects from hanging, but not from cascade failure — CB returns a fallback immediately without wasting time.
  • How to calculate failure rate? Sliding window: last N calls or calls within T seconds.

Red flags (DO NOT say):

  • “Circuit Breaker = Retry” — no, CB blocks, Retry tries again
  • “Threshold 10% — reliability” — no, there will be many false positives
  • “CB is not needed for internal services” — cascade failure is possible everywhere
  • “HALF-OPEN can be skipped” — without it you won’t know if the service is fixed

Related topics:

  • [[6. How does Circuit Breaker work and what states does it have]]
  • [[17. How to ensure fault tolerance of microservices]]
  • [[18. What is Bulkhead pattern]]
  • [[19. What is Retry pattern and how to use it correctly]]
  • [[15. How to organize communication between microservices]]