What is exponential backoff

🟢 Junior Level

Exponential backoff is a retry strategy where the wait time doubles after each failed attempt.

Attempt 1: error → wait 1 sec
Attempt 2: error → wait 2 sec
Attempt 3: error → wait 4 sec
Attempt 4: error → wait 8 sec
Attempt 5: success! ✅

Why: If a service is overloaded, give it more time to recover.

🟡 Middle Level

Implementation

// ⚠️ BAD example for production: Thread.sleep blocks the thread,
// no jitter, no max cap. Use Resilience4j or Spring Retry.
long baseDelay = 1000; // 1 second
double multiplier = 2.0;

for (int attempt = 0; attempt < maxAttempts; attempt++) {
    try {
        return callService();
    } catch (Exception e) {
        long delay = (long) (baseDelay * Math.pow(multiplier, attempt));
        Thread.sleep(delay);
    }
}

With jitter

// Jitter adds randomness to prevent thundering herd
// This is "equal jitter" variant. AWS recommends "full jitter":
// sleep = random(0, min(cap, base * 2^attempt))
long delay = (long) (baseDelay * Math.pow(multiplier, attempt));
long jitter = random.nextInt((int)(delay * 0.1));
delay += jitter;

Common mistakes

Without jitter:

All clients retry simultaneously → thundering herd → service crashes again

When NOT to use exponential backoff

Idempotent read operations — simple retry is ok
Time-critical operations (real-time) — backoff adds unpredictability
Client requests from a user — user won’t wait 30 seconds for backoff

🔴 Senior Level

Full jitter formula

// AWS recommends: full jitter
long delay = min(cap, base * pow(multiplier, attempt)) + random(0, base)

Production Experience

Resilience4j:

RetryConfig config = RetryConfig.custom()
    .maxAttempts(5)
    .intervalFunction(IntervalFunction.ofExponentialBackoff(1000, 2.0))
    .build();

Best Practices

✅ Exponential backoff + jitter
✅ Cap on maximum delay
✅ Attempt limit
✅ Only for retryable errors

❌ Without cap (can wait minutes)
❌ Without jitter
❌ For all error types

🎯 Interview Cheat Sheet

Must know:

Exponential backoff — delay doubles: 1s → 2s → 4s → 8s
Jitter adds randomness: AWS recommends full jitter = random(0, min(cap, base * 2^attempt))
Cap (maximum delay) is mandatory — without cap, you can wait minutes
Only for retryable errors — not for 4xx, not for business exceptions
Do NOT use for idempotent reads (simple retry ok), real-time operations, user client requests
Resilience4j: IntervalFunction.ofExponentialBackoff(base, multiplier)
Thread.sleep in production — bad, use Resilience4j or Spring Retry

Frequent follow-up questions:

Full jitter vs equal jitter? Full jitter = random(0, base2^attempt), equal jitter = delay + random(0, delay0.1). Full jitter better prevents thundering herd.
What cap to choose? Depends on SLA — usually 30-60 seconds max.
Why is Thread.sleep bad? Blocks the thread, no jitter, no max cap — use Resilience4j.
When NOT to use backoff? Read operations (simple retry ok), real-time, user client requests.

Red flags (NOT to say):

“Backoff without cap is more reliable” — no, you can wait indefinitely
“Jitter is not needed for low traffic” — still needed, clients can synchronize
“Backoff for 404 errors” — no, the server won’t fix itself
“Thread.sleep is production-ready” — no, blocks the thread, no jitter

Related topics:

[[19. What is Retry pattern and how to use it correctly]]
[[17. How to ensure fault tolerance of microservices]]
[[15. How to organize communication between microservices]]
[[5. What is Circuit Breaker pattern]]
[[21. How to monitor a distributed microservices system]]