Question 17 · Section 15

What are leader and follower replicas

4. Monitoring ISR health — critical for production

Language versions: English Russian Ukrainian

Junior Level

Definition

Leader — the primary replica that accepts reads and writes.

Follower — a copy that replicates data from the leader.

Partition 0:
  Broker 1 → Leader (read/write)
  Broker 2 → Follower (read-only, replicates from leader)
  Broker 3 → Follower (read-only, replicates from leader)

Key Roles

Leader:
  - Accepts writes from producers
  - Serves data to consumers
  - Manages the ISR list

Follower:
  - Copies data from the leader
  - Does not accept writes from producers
  - Can become leader during failover

Failover

Leader goes down:
  Controller selects a new leader from ISR
  Follower → new Leader
  Consumers and producers switch over automatically

Middle Level

Replication Flow

Producer → Leader → Follower 1
                  → Follower 2
                  → Follower 3

Consumer → reads only from Leader (by default; in Kafka 2.4+ you can configure
reading from followers via replica.selector.class).

ISR (In-Sync Replicas)

ISR includes replicas that:
1. Are active and send heartbeats
2. Are not behind the leader by more than replica.lag.time.max.ms

Example:
  ISR: [Broker 1 (Leader), Broker 2, Broker 3]
  If Broker 3 falls behind → ISR: [Broker 1, Broker 2]

Unclean Leader Election

unclean.leader.election.enable=false (default):
  Leader is selected only from ISR
  Guarantee of no data loss

unclean.leader.election.enable=true:
  Leader can be selected from any follower
  Risk of data loss
  If the leader committed offset 100 and the follower only has offset 95,
  on unclean election, messages 96–100 will be lost.

Common Mistakes

  1. Unclean leader election:
    A follower not in ISR becomes leader
    → Data that the follower doesn't have is lost
    
  2. Uneven leader distribution:
    Broker A: leader for 80% of partitions
    Broker B: leader for 20% of partitions
    → Uneven load
    

Senior Level

ISR Management

Follower is removed from ISR if:

- Does not send a fetch request for > replica.lag.time.max.ms
- Falls too far behind the leader
- Broker is unavailable

Follower returns to ISR when:

- Catches up with the leader
- Starts receiving current data
- Controller adds it back to ISR

Replica Fetcher

Followers periodically send fetch requests to the leader:
- replica.fetch.wait.max.ms
- replica.fetch.min.bytes
- replica.fetch.max.bytes

These settings affect replication latency and network usage

Leader Election — in Detail

Controller Broker:

- One broker is the cluster controller
- Stores metadata for all partitions
- Responsible for leader election
- If the controller goes down → a new one is elected

Election Process:

1. Leader goes down or becomes unreachable
2. Controller detects it via ZooKeeper/KRaft
3. Checks the ISR list
4. Selects a new leader (first in ISR)
5. Updates metadata
6. Notifies brokers and clients

Preferred Leader

Each partition has a preferred leader (first broker in the replica list)
For an unbalanced cluster:
  kafka-leader-balancer.sh --bootstrap-server localhost:9092
  → Reassigns leaders to preferred brokers
  → Balances the load

Monitoring

Key metrics:

kafka.server:UnderReplicatedPartitions
kafka.server:IsrShrinksPerSec
kafka.server:IsrExpandsPerSec
kafka.server:LeaderCount
kafka.server:ReplicaManager:PartitionCount

Alerts:

- Under-replicated partitions > 0 → warning
- ISR shrinks per sec > threshold → warning
- Leader count imbalance > 20% → warning
- Replica lag > threshold → critical

Best Practices

✅ ISR monitoring
✅ unclean.leader.election.enable=false
✅ Even leader distribution
✅ Regular leader balancer
✅ Monitor replica lag
✅ RF=3 for production

❌ Unclean leader election
❌ Ignoring ISR shrink
❌ Uneven leader distribution
❌ Without monitoring replica lag
❌ RF < 3 for production

Architectural Decisions

  1. ISR — consistency guarantee — leader only from ISR
  2. Unclean election = data loss — avoid in production
  3. Leader balancing — even load on brokers
  4. Monitoring ISR health — critical for production

Summary for Senior

  • Leader accepts all reads and writes
  • Followers passively copy data
  • ISR management is critical for consistency
  • Leader election from ISR guarantees no data loss
  • Monitoring replica lag and ISR shrink/expansion is mandatory

🎯 Interview Cheat Sheet

Must know:

  • Leader — the only replica accepting read/write from producers and consumers
  • Follower — passively copies data from the leader; can become leader on failover
  • ISR includes both the leader and in-sync followers; leader is selected only from ISR
  • unclean.leader.election.enable=false (default) — only from ISR, otherwise data loss
  • On unclean election: follower at offset 95 replaces leader at offset 100 → loss of 96–100
  • Replica Fetcher Thread: follower periodically fetches data from the leader
  • Preferred Leader — first broker in the replica list; leader balancer distributes evenly

Common follow-up questions:

  • Why do consumers read only from the leader? — Single source of truth, ordering guarantee. (Kafka 2.4+ allows reading from followers via replica.selector.class.)
  • When does a follower leave ISR? — Does not send fetch for > replica.lag.time.max.ms (30s).
  • What is a Preferred Leader? — The broker that should be the leader by plan; leader balancer returns it to preferred.
  • Can you write to a follower? — No, all writes go through the leader.

Red flags (DO NOT say):

  • “Followers accept writes” — only the leader
  • “Unclean election is normal practice” — data loss
  • “ISR only includes followers” — the leader is always in ISR
  • “Consumers can read from any replica by default” — only from the leader (by default)

Related topics:

  • [[16. What is replication in Kafka]]
  • [[18. What is ISR (In-Sync Replicas)]]
  • [[19. How does Kafka ensure fault tolerance]]
  • [[20. What is producer acknowledgment and what modes exist (acks=0,1,all)]]