Question 6 · Section 17

How does Circuit Breaker work and what states does it have

A Circuit Breaker has three states:

Language versions: English Russian Ukrainian

Junior Level

A Circuit Breaker has three states:

  1. CLOSED — everything is normal, calls go through
  2. OPEN — service is “broken”, calls are blocked, fallback returned immediately
  3. HALF-OPEN — probe call to check
Start: CLOSED (calls go through)
        ↓ too many errors
       OPEN (block calls for 10 seconds)
        ↓ time elapsed
       HALF-OPEN (one probe call)
        ↓ success -> CLOSED
        ↓ error -> OPEN (wait again)

Middle Level

Transition details

CLOSED -> OPEN:

failureRate > threshold (e.g. 50%)
OR slowCallRate > threshold (e.g. > 50% slow calls)

OPEN -> HALF-OPEN:

waitDurationInOpenState elapsed (e.g. 10 seconds)

HALF-OPEN -> CLOSED:

permittedNumberOfCallsInHalfOpenState successful calls
(e.g. 3 successful calls)

HALF-OPEN -> OPEN:

Any error in HALF-OPEN state

Common mistakes

  1. Wrong sliding window size:
    slidingWindowSize = 5 -> too small
    One error = 20% failure rate
    Better: 10-100 calls
    

Senior Level

Internal Implementation

Metrics tracking:

// Ring buffer to store call results
class SlidingWindow {
    private final CircularArray measurements;

    void record(CallResult result) {
        measurements.add(result);
        // If minCalls is never reached — the breaker will NOT open, even if ALL calls
        // FAILED. This is important: under low traffic the circuit breaker may not trigger.
        if (measurements.size() >= minCalls) {
            double failureRate = calculateFailureRate();
            if (failureRate > threshold) {
                stateTransition(CLOSED, OPEN);
            }
        }
    }
}

Production Experience

Monitoring:

// Metrics for monitoring
CircuitBreaker.Metrics metrics = circuitBreaker.getMetrics();
metrics.getFailureRate();       // % errors
metrics.getNumberOfBufferedCalls();
metrics.getNumberOfFailedCalls();
metrics.getState();             // CLOSED/OPEN/HALF-OPEN

Best Practices

// These patterns are NOT alternatives, they are complementary: // Retry + Timeout + CircuitBreaker are used together.

✅ minimumNumberOfCalls = 10+ (statistically significant sample)
✅ failureRateThreshold = 50% (compromise: don't open from one error, but don't wait until everything crashes)
✅ waitDurationInOpenState = 10-60s (enough time for the dependent service to restart, but not too long)
✅ permittedCallsInHalfOpen = 3-5

❌ Sliding window too small
❌ Wait duration too short
❌ Without metrics monitoring

Interview Cheat Sheet

Must know:

  • Three states: CLOSED, OPEN, HALF-OPEN with clear transition rules
  • CLOSED->OPEN: failure rate > threshold (50%), OPEN->HALF-OPEN: wait duration elapsed
  • HALF-OPEN->CLOSED: N successful calls, HALF-OPEN->OPEN: any error
  • Sliding window (count-based or time-based) for metrics tracking
  • minimumNumberOfCalls = 10+ for a statistically significant sample
  • Under low traffic the circuit breaker may not trigger (minCalls not reached)
  • Ring buffer for storing call results

Common follow-up questions:

  • What is permittedNumberOfCallsInHalfOpen? How many probe calls to make in HALF-OPEN — if all succeed, return to CLOSED.
  • Why is minimumNumberOfCalls important? Without it, the breaker will open from 1-2 errors under low traffic.
  • How to monitor CB? Metrics: failure rate, buffered calls, failed calls, current state.
  • Count-based vs time-based window? Count-based = last N calls, time-based = calls within T seconds.

Red flags (DO NOT say):

  • “Sliding window = 5 is enough” — no, one error = 20% failure rate
  • “Wait duration = 1 second” — the service won’t have time to restart
  • “CB works without metrics” — impossible to configure without monitoring
  • “HALF-OPEN passes all traffic” — no, only permittedNumberOfCalls

Related topics:

  • [[5. What is Circuit Breaker pattern]]
  • [[17. How to ensure fault tolerance of microservices]]
  • [[19. What is Retry pattern and how to use it correctly]]
  • [[21. How to monitor a distributed microservice system]]
  • [[20. What is exponential backoff]]