What happens when you add a new consumer to the group
When you add a new consumer, a rebalance occurs — partitions are redistributed among all consumers.
Junior Level
Short answer
When you add a new consumer, a rebalance occurs — partitions are redistributed among all consumers.
Example
Before adding:
C1 → P0, P1
C2 → P2
Added C3:
C1 → P0
C2 → P1
C3 → P2
What happens during rebalance?
Eager rebalancing: 5-30 seconds of full downtime.
Cooperative rebalancing: 1-5 seconds with continued processing on remaining partitions.
Messages accumulate in Kafka
After rebalance, reading resumes
How to add a consumer?
// Just launch a new consumer with the same group.id
props.put("group.id", "my-group");
consumer.subscribe(List.of("orders"));
// Kafka will automatically trigger a rebalance
Middle Level
Rebalancing process (step by step)
1. New consumer connects to broker
2. Sends JoinGroup request
3. Group coordinator starts rebalance
4. All consumers receive notification
5. All consumers stop reading (eager)
or continue with some partitions (cooperative)
6. New assignment is computed
7. Assignment is distributed to consumers
8. Consumers start reading new partitions
Rebalancing Strategies
1. Eager (old approach):
All consumers stop completely
Full redistribution
All resume work
→ Full group downtime
2. Cooperative (Kafka 2.3+):
Gradual partition movement
Consumers continue working with remaining partitions
→ Minimal downtime
ConsumerRebalanceListener
consumer.subscribe(List.of("orders"), new ConsumerRebalanceListener() {
@Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
// Commit offsets before losing partitions
consumer.commitSync();
// Clean up local state
cleanupState(partitions);
}
@Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
// Initialize new partitions
initializeState(partitions);
// Can start reading from a specific offset
for (TopicPartition tp : partitions) {
consumer.seek(tp, getSavedOffset(tp));
}
}
});
Frequent rebalances — a problem
Unstable consumers → constant rebalances → system downtime
Causes:
- session.timeout.ms too short
- max.poll.interval.ms exceeded (long processing)
- Network issues
- GC pauses
Symptoms:
- Log messages: "Rebalance triggered"
- Consumer lag growth
- Throughput drop
Common mistakes
- Without ConsumerRebalanceListener:
Data loss on rebalance (offsets not committed) - Frequent rebalances:
Consumers crash → rebalance → crash again → cycle - Long processing between poll:
max.poll.interval.ms exceeded → consumer excluded → rebalance
Senior Level
Cooperative Rebalancing — in detail
Incremental Cooperative Rebalancing (Kafka 2.3+):
Eager:
Revoke all partitions → Assign new → Resume
→ Full downtime
Cooperative:
Step-by-step movement:
1. Revoke some partitions
2. Assign new partitions
3. Continue working
4. Repeat if needed
→ Minimal downtime
Configuration:
props.put("partition.assignment.strategy",
"org.apache.kafka.clients.consumer.CooperativeStickyAssignor");
Group Coordinator Protocol
Rebalancing protocol:
JoinGroup — consumer says "I want to join the group".
SyncGroup — leader distributes the assignment result.
1. Join Phase:
- Consumer → JoinGroup request
- Coordinator → collects all members
- Coordinator → chooses leader
2. Assignment Phase:
- Leader → computes assignment
- Leader → SyncGroup request with assignment
- Coordinator → distributes assignments
3. Steady State:
- Consumers → Heartbeat requests
- Coordinator → confirms alive
- Consumers → process messages
Production Configuration
partition.assignment.strategy: cooperative-sticky
session.timeout.ms: 30000 # 30 seconds
heartbeat.interval.ms: 10000 # 10 seconds (1/3)
max.poll.interval.ms: 300000 # 5 minutes
max.poll.records: 500 # batch
enable.auto.commit: false
Static Membership
// Permanent consumer ID
props.put("group.instance.id", "consumer-1");
// On restart:
// - Doesn't trigger rebalance
// - Returns the same partitions
// - Waits session.timeout.ms before rebalance
Monitoring Rebalancing
Key metrics:
kafka.consumer.coordinator:num-rebalances
kafka.consumer.coordinator:time-since-last-rebalance
kafka.consumer:assigned-partitions
kafka.consumer:commit-latency-avg
Alerts:
- Rebalance more often than every 10 minutes → warning
- Rebalance more often than every 5 minutes → critical
- Consumer hasn't sent heartbeat > 30s → critical
- Consumer lag growing → warning
Troubleshooting Rebalancing
1. Identify the cause:
// Logs show:
"Heartbeat session expired" → session.timeout.ms too short
"Max poll interval exceeded" → processing too slow
"Member is dead" → consumer crashed
2. Resolving issues:
Heartbeat issues:
Increase session.timeout.ms
Increase heartbeat.interval.ms
Poll interval issues:
Reduce max.poll.records
Optimize message processing
Increase max.poll.interval.ms
Network issues:
Check connectivity to brokers
Check firewall rules
Advanced Patterns
1. Gradual Rollout:
Add consumers gradually:
1. Launch one consumer
2. Wait for stabilization
3. Check lag and throughput
4. Add the next one
2. Partition-aware State Management:
public class StatefulConsumer {
private final Map<TopicPartition, State> states = new HashMap<>();
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
// Save state for each partition
for (TopicPartition tp : partitions) {
states.get(tp).save();
states.remove(tp);
}
}
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
// Load state for new partitions
for (TopicPartition tp : partitions) {
states.put(tp, State.load(tp));
}
}
}
Best Practices
✅ Cooperative-sticky assignor
✅ Sufficient session timeout (30s+)
✅ Fast message processing
✅ ConsumerRebalanceListener for graceful handling
✅ Static membership for stable instances
✅ Monitor rebalance frequency
❌ Eager assignor acceptable for static clusters. For production with rolling deploy — use CooperativeSticky.
❌ Too short timeout
❌ Without onPartitionsRevoked handling
❌ Ignoring consumer lag
Architectural decisions
- Cooperative rebalancing — standard for production
- Static membership — for stable instances
- State management — save state on rebalance
- Monitoring — early problem detection
Summary for Senior
- Rebalancing — critical process for group availability
- Cooperative rebalancing minimizes disruption
- ConsumerRebalanceListener is mandatory for production
- Static membership reduces unnecessary rebalances
- Monitoring rebalance frequency — early problem indicator
- Troubleshooting requires analysis of logs and metrics
🎯 Interview Cheat Sheet
Must know:
- Adding a new consumer triggers a rebalance — partitions are redistributed
- Eager rebalancing: 5-30 seconds full downtime; Cooperative: 1-5 seconds
- ConsumerRebalanceListener is mandatory: commitSync in onPartitionsRevoked
- Frequent rebalances — a problem: session.timeout exceeded, max.poll.interval exceeded
- CooperativeStickyAssignor — standard for production (Kafka 2.3+)
- Static membership (
group.instance.id) — eliminates rebalance on rolling deploy - Gradual rollout: add consumers gradually, check lag
Common follow-up questions:
- What happens if a consumer processes for too long? — max.poll.interval exceeded → exclusion → rebalance.
- How to speed up rebalance? — CooperativeStickyAssignor + static membership.
- What to do with stateful consumer on rebalance? — Save state in onPartitionsRevoked, load in onPartitionsAssigned.
- Can you disable rebalance? — No, but static membership minimizes it for known instances.
Red flags (DO NOT say):
- “Rebalancing happens instantly” — 5-30 seconds eager, 1-5 cooperative
- “You can ignore onPartitionsRevoked” — loss of offsets and state
- “Frequent rebalances are normal” — sign of timeout or processing problems
- “Eager rebalancing is the best choice” — cooperative minimizes disruption
Related topics:
- [[15. What is rebalancing and when does it happen]]
- [[6. How does consumer balancing work in a group]]
- [[5. What is Consumer Group]]
- [[14. What is the difference between auto commit and manual commit]]