What is Circuit Breaker pattern
Cascade failure (chain failure): service A waits for service B, service C waits for A — all threads hang, and the entire system crashes.
Junior Level
Circuit Breaker — a pattern that protects a service from calling a non-functioning dependent service.
Real-life analogy: an electrical circuit breaker cuts off current during overload to prevent wiring from burning out.
Three states:
- CLOSED — normal mode, calls go through
- OPEN — calls are blocked (service is “broken”)
- HALF-OPEN — probe call (check if the service is fixed)
Service A -> Service B (working)
↓ (starts lagging)
Circuit Breaker -> OPEN (blocks calls)
↓ (waited, check)
Circuit Breaker -> HALF-OPEN -> one call to check
↓ (success!)
Circuit Breaker -> CLOSED (working again)
Middle Level
How it works
// Resilience4j
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
.failureRateThreshold(50) // 50% — compromise: don't open from a single random error.
// slidingWindowSize=10, minimumNumberOfCalls=5: need at least 5 calls,
// and if 50%+ of them FAILED — the breaker opens.
.waitDurationInOpenState(Duration.ofSeconds(10)) // wait 10 sec
.slidingWindowSize(10) // last 10 calls
.build();
CircuitBreaker circuitBreaker = CircuitBreaker.of("backend", config);
// Usage
Supplier<String> decorated = CircuitBreaker
.decorateSupplier(circuitBreaker, () -> backendService.call());
Try.of(decorated)
.onSuccess(result -> process(result))
.onFailure(error -> fallback());
When to use
Circuit Breaker is needed when:
- An external service may be unavailable
- Network problems (timeout, connection refused)
- Need to protect your service from cascade failure
Cascade failure (chain failure): service A waits for service B, service C waits for A — all threads hang, and the entire system crashes.
Circuit Breaker is not needed when:
- The call is always fast and reliable
- Internal calls within the same process
Common mistakes
- Too aggressive threshold:
failureRateThreshold = 10% -> OPEN after 1 error out of 10 Too many false positives
Senior Level
Internal Implementation
State transitions:
CLOSED -> OPEN: failure rate > threshold
OPEN -> HALF-OPEN: wait duration elapsed
HALF-OPEN -> CLOSED: success call
HALF-OPEN -> OPEN: failure call
Sliding window:
COUNT-based: last N calls
TIME-based: calls in the last T seconds
Resilience4j supports both types
Architectural Trade-offs
| Approach | Pros | Cons |
|---|---|---|
| Circuit Breaker | Fast fallback because it doesn’t waste time waiting for timeout from a broken service — returns a stub immediately. | Complex configuration |
| Retry | Tries again | Can worsen the problem |
| Timeout | Protects from hanging | Does not protect from errors |
Production Experience
Spring Boot + Resilience4j:
@CircuitBreaker(name = "backend", fallbackMethod = "fallback")
@Retry(name = "backend")
@TimeLimiter(name = "backend")
public CompletableFuture<String> callBackend(String input) {
return backendService.callAsync(input);
}
public CompletableFuture<String> fallback(String input, RuntimeException ex) {
return CompletableFuture.completedFuture("fallback: " + input);
}
Configuration:
resilience4j:
circuitbreaker:
instances:
backend:
slidingWindowSize: 10
failureRateThreshold: 50
waitDurationInOpenState: 10s
permittedNumberOfCallsInHalfOpenState: 3
minimumNumberOfCalls: 5
Best Practices
✅ Configure threshold based on metrics
✅ Use fallback for graceful degradation
✅ Monitor Circuit Breaker state
✅ Combine with Retry and Timeout
❌ Don't use too low a threshold
❌ Don't ignore HALF-OPEN state
❌ Don't use without fallback
Interview Cheat Sheet
Must know:
- Circuit Breaker protects from cascade failure by blocking calls to a non-functioning service
- Three states: CLOSED (normal), OPEN (blocking), HALF-OPEN (probe)
- Transition CLOSED->OPEN when failure rate exceeds threshold (usually 50%)
- OPEN->HALF-OPEN after wait duration (10-60 sec), HALF-OPEN->CLOSED on success
- Sliding window (10-100 calls) for a statistically significant sample
- Always combine with fallback for graceful degradation
- Resilience4j — standard library for Java/Spring
Common follow-up questions:
- Why 50% threshold? Compromise: don’t open from a single random error, but don’t wait until everything crashes.
- Why HALF-OPEN? To check if the service is fixed before restoring full traffic.
- How is Circuit Breaker better than a simple timeout? Timeout protects from hanging, but not from cascade failure — CB returns a fallback immediately without wasting time.
- How to calculate failure rate? Sliding window: last N calls or calls within T seconds.
Red flags (DO NOT say):
- “Circuit Breaker = Retry” — no, CB blocks, Retry tries again
- “Threshold 10% — reliability” — no, there will be many false positives
- “CB is not needed for internal services” — cascade failure is possible everywhere
- “HALF-OPEN can be skipped” — without it you won’t know if the service is fixed
Related topics:
- [[6. How does Circuit Breaker work and what states does it have]]
- [[17. How to ensure fault tolerance of microservices]]
- [[18. What is Bulkhead pattern]]
- [[19. What is Retry pattern and how to use it correctly]]
- [[15. How to organize communication between microservices]]