What are leader and follower replicas
4. Monitoring ISR health — critical for production
Junior Level
Definition
Leader — the primary replica that accepts reads and writes.
Follower — a copy that replicates data from the leader.
Partition 0:
Broker 1 → Leader (read/write)
Broker 2 → Follower (read-only, replicates from leader)
Broker 3 → Follower (read-only, replicates from leader)
Key Roles
Leader:
- Accepts writes from producers
- Serves data to consumers
- Manages the ISR list
Follower:
- Copies data from the leader
- Does not accept writes from producers
- Can become leader during failover
Failover
Leader goes down:
Controller selects a new leader from ISR
Follower → new Leader
Consumers and producers switch over automatically
Middle Level
Replication Flow
Producer → Leader → Follower 1
→ Follower 2
→ Follower 3
Consumer → reads only from Leader (by default; in Kafka 2.4+ you can configure
reading from followers via replica.selector.class).
ISR (In-Sync Replicas)
ISR includes replicas that:
1. Are active and send heartbeats
2. Are not behind the leader by more than replica.lag.time.max.ms
Example:
ISR: [Broker 1 (Leader), Broker 2, Broker 3]
If Broker 3 falls behind → ISR: [Broker 1, Broker 2]
Unclean Leader Election
unclean.leader.election.enable=false (default):
Leader is selected only from ISR
Guarantee of no data loss
unclean.leader.election.enable=true:
Leader can be selected from any follower
Risk of data loss
If the leader committed offset 100 and the follower only has offset 95,
on unclean election, messages 96–100 will be lost.
Common Mistakes
- Unclean leader election:
A follower not in ISR becomes leader → Data that the follower doesn't have is lost - Uneven leader distribution:
Broker A: leader for 80% of partitions Broker B: leader for 20% of partitions → Uneven load
Senior Level
ISR Management
Follower is removed from ISR if:
- Does not send a fetch request for > replica.lag.time.max.ms
- Falls too far behind the leader
- Broker is unavailable
Follower returns to ISR when:
- Catches up with the leader
- Starts receiving current data
- Controller adds it back to ISR
Replica Fetcher
Followers periodically send fetch requests to the leader:
- replica.fetch.wait.max.ms
- replica.fetch.min.bytes
- replica.fetch.max.bytes
These settings affect replication latency and network usage
Leader Election — in Detail
Controller Broker:
- One broker is the cluster controller
- Stores metadata for all partitions
- Responsible for leader election
- If the controller goes down → a new one is elected
Election Process:
1. Leader goes down or becomes unreachable
2. Controller detects it via ZooKeeper/KRaft
3. Checks the ISR list
4. Selects a new leader (first in ISR)
5. Updates metadata
6. Notifies brokers and clients
Preferred Leader
Each partition has a preferred leader (first broker in the replica list)
For an unbalanced cluster:
kafka-leader-balancer.sh --bootstrap-server localhost:9092
→ Reassigns leaders to preferred brokers
→ Balances the load
Monitoring
Key metrics:
kafka.server:UnderReplicatedPartitions
kafka.server:IsrShrinksPerSec
kafka.server:IsrExpandsPerSec
kafka.server:LeaderCount
kafka.server:ReplicaManager:PartitionCount
Alerts:
- Under-replicated partitions > 0 → warning
- ISR shrinks per sec > threshold → warning
- Leader count imbalance > 20% → warning
- Replica lag > threshold → critical
Best Practices
✅ ISR monitoring
✅ unclean.leader.election.enable=false
✅ Even leader distribution
✅ Regular leader balancer
✅ Monitor replica lag
✅ RF=3 for production
❌ Unclean leader election
❌ Ignoring ISR shrink
❌ Uneven leader distribution
❌ Without monitoring replica lag
❌ RF < 3 for production
Architectural Decisions
- ISR — consistency guarantee — leader only from ISR
- Unclean election = data loss — avoid in production
- Leader balancing — even load on brokers
- Monitoring ISR health — critical for production
Summary for Senior
- Leader accepts all reads and writes
- Followers passively copy data
- ISR management is critical for consistency
- Leader election from ISR guarantees no data loss
- Monitoring replica lag and ISR shrink/expansion is mandatory
🎯 Interview Cheat Sheet
Must know:
- Leader — the only replica accepting read/write from producers and consumers
- Follower — passively copies data from the leader; can become leader on failover
- ISR includes both the leader and in-sync followers; leader is selected only from ISR
unclean.leader.election.enable=false(default) — only from ISR, otherwise data loss- On unclean election: follower at offset 95 replaces leader at offset 100 → loss of 96–100
- Replica Fetcher Thread: follower periodically fetches data from the leader
- Preferred Leader — first broker in the replica list; leader balancer distributes evenly
Common follow-up questions:
- Why do consumers read only from the leader? — Single source of truth, ordering guarantee. (Kafka 2.4+ allows reading from followers via replica.selector.class.)
- When does a follower leave ISR? — Does not send fetch for > replica.lag.time.max.ms (30s).
- What is a Preferred Leader? — The broker that should be the leader by plan; leader balancer returns it to preferred.
- Can you write to a follower? — No, all writes go through the leader.
Red flags (DO NOT say):
- “Followers accept writes” — only the leader
- “Unclean election is normal practice” — data loss
- “ISR only includes followers” — the leader is always in ISR
- “Consumers can read from any replica by default” — only from the leader (by default)
Related topics:
- [[16. What is replication in Kafka]]
- [[18. What is ISR (In-Sync Replicas)]]
- [[19. How does Kafka ensure fault tolerance]]
- [[20. What is producer acknowledgment and what modes exist (acks=0,1,all)]]