How does Circuit Breaker work and what states does it have
A Circuit Breaker has three states:
Junior Level
A Circuit Breaker has three states:
- CLOSED — everything is normal, calls go through
- OPEN — service is “broken”, calls are blocked, fallback returned immediately
- HALF-OPEN — probe call to check
Start: CLOSED (calls go through)
↓ too many errors
OPEN (block calls for 10 seconds)
↓ time elapsed
HALF-OPEN (one probe call)
↓ success -> CLOSED
↓ error -> OPEN (wait again)
Middle Level
Transition details
CLOSED -> OPEN:
failureRate > threshold (e.g. 50%)
OR slowCallRate > threshold (e.g. > 50% slow calls)
OPEN -> HALF-OPEN:
waitDurationInOpenState elapsed (e.g. 10 seconds)
HALF-OPEN -> CLOSED:
permittedNumberOfCallsInHalfOpenState successful calls
(e.g. 3 successful calls)
HALF-OPEN -> OPEN:
Any error in HALF-OPEN state
Common mistakes
- Wrong sliding window size:
slidingWindowSize = 5 -> too small One error = 20% failure rate Better: 10-100 calls
Senior Level
Internal Implementation
Metrics tracking:
// Ring buffer to store call results
class SlidingWindow {
private final CircularArray measurements;
void record(CallResult result) {
measurements.add(result);
// If minCalls is never reached — the breaker will NOT open, even if ALL calls
// FAILED. This is important: under low traffic the circuit breaker may not trigger.
if (measurements.size() >= minCalls) {
double failureRate = calculateFailureRate();
if (failureRate > threshold) {
stateTransition(CLOSED, OPEN);
}
}
}
}
Production Experience
Monitoring:
// Metrics for monitoring
CircuitBreaker.Metrics metrics = circuitBreaker.getMetrics();
metrics.getFailureRate(); // % errors
metrics.getNumberOfBufferedCalls();
metrics.getNumberOfFailedCalls();
metrics.getState(); // CLOSED/OPEN/HALF-OPEN
Best Practices
// These patterns are NOT alternatives, they are complementary: // Retry + Timeout + CircuitBreaker are used together.
✅ minimumNumberOfCalls = 10+ (statistically significant sample)
✅ failureRateThreshold = 50% (compromise: don't open from one error, but don't wait until everything crashes)
✅ waitDurationInOpenState = 10-60s (enough time for the dependent service to restart, but not too long)
✅ permittedCallsInHalfOpen = 3-5
❌ Sliding window too small
❌ Wait duration too short
❌ Without metrics monitoring
Interview Cheat Sheet
Must know:
- Three states: CLOSED, OPEN, HALF-OPEN with clear transition rules
- CLOSED->OPEN: failure rate > threshold (50%), OPEN->HALF-OPEN: wait duration elapsed
- HALF-OPEN->CLOSED: N successful calls, HALF-OPEN->OPEN: any error
- Sliding window (count-based or time-based) for metrics tracking
- minimumNumberOfCalls = 10+ for a statistically significant sample
- Under low traffic the circuit breaker may not trigger (minCalls not reached)
- Ring buffer for storing call results
Common follow-up questions:
- What is permittedNumberOfCallsInHalfOpen? How many probe calls to make in HALF-OPEN — if all succeed, return to CLOSED.
- Why is minimumNumberOfCalls important? Without it, the breaker will open from 1-2 errors under low traffic.
- How to monitor CB? Metrics: failure rate, buffered calls, failed calls, current state.
- Count-based vs time-based window? Count-based = last N calls, time-based = calls within T seconds.
Red flags (DO NOT say):
- “Sliding window = 5 is enough” — no, one error = 20% failure rate
- “Wait duration = 1 second” — the service won’t have time to restart
- “CB works without metrics” — impossible to configure without monitoring
- “HALF-OPEN passes all traffic” — no, only permittedNumberOfCalls
Related topics:
- [[5. What is Circuit Breaker pattern]]
- [[17. How to ensure fault tolerance of microservices]]
- [[19. What is Retry pattern and how to use it correctly]]
- [[21. How to monitor a distributed microservice system]]
- [[20. What is exponential backoff]]