How does consumer balancing work in a group

🟢 Junior Level

What is balancing?

Balancing is the automatic process of distributing partitions among consumers in a group. Kafka itself decides which consumer reads which partitions.

Why: without it, you’d have to manually assign partitions to each consumer. When the number of consumers or partitions changes — everything is recalculated automatically.

Analogy

Think of a pizzeria where you need to slice pizza (partitions) among friends (consumers):

3 slices, 3 friends → each gets 1 slice
3 slices, 2 friends → one gets 2 slices, the other gets 1
3 slices, 5 friends → 3 get a slice each, 2 go hungry

Basic rule

Number of active consumers <= Number of partitions

Example

3 partitions, 3 consumers:
  C1 → P0
  C2 → P1
  C3 → P2  ← ideal balance

Added C4:
  C1 → P0
  C2 → P1
  C3 → P2
  C4 → waiting (no free partitions)

How it works in code

// All consumers with the same group.id
props.put("group.id", "my-group");
consumer.subscribe(List.of("orders"));
// Kafka will automatically distribute partitions

🟡 Middle Level

Balancing strategies

Strategy	Algorithm	When to use
Range	Divides partitions by range	Works when partition count is a multiple of consumer count. With odd division — unevenness. For production with dynamic scaling, CooperativeSticky is better.
RoundRobin	Alternates partitions	When you need uniformity
Sticky	Minimizes movement	Production (Kafka 2.2+)
CooperativeSticky	Step-by-step movement	Production (Kafka 2.3+, recommended)

Range Assignor — in detail

Partitions 0,1,2,3 → 2 consumers:
  C1: 0, 1  (2 partitions)
  C2: 2, 3  (2 partitions)
  → Even

Partitions 0,1,2,3,4 → 2 consumers:
  C1: 0, 1, 2  (3 partitions)
  C2: 3, 4     (2 partitions)
  → Uneven! (Range = numPartitions / numConsumers)

RoundRobin Assignor

Partitions 0,1,2,3,4 → 2 consumers:
  C1: 0, 2, 4  (3 partitions)
  C2: 1, 3     (2 partitions)
  → Slightly better with odd numbers

Rebalancing process

Trigger event (new consumer, crash, timeout)
Group coordinator starts rebalance
All consumers receive notification
All consumers stop reading (eager)
   or continue with some partitions (cooperative)
New assignment is computed
Assignment is distributed to consumers
Consumers start reading new partitions

ConsumerRebalanceListener

consumer.subscribe(List.of("orders"), new ConsumerRebalanceListener() {
    @Override
    public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
        // 1. Commit offsets before losing partitions
        consumer.commitSync();
        // 2. Clean up local resources
        cleanup(partitions);
    }

    @Override
    public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
        // 3. Initialize new partitions
        initialize(partitions);
    }
});

Chaos Rebalancing Problem

session.timeout.ms=10000, batch processing = 15 seconds,
max.poll.interval.ms=300000. Consumer can't call poll() in time → excluded →
rebalance → load on remaining increases → another one crashes → avalanche.

Consumers crash → rebalance → new consumers crash → cycle

Causes:
- session.timeout.ms too short
- Long processing (max.poll.interval.ms exceeded)
- Memory/CPU issues

Solution:
- Increase timeouts
- Optimize processing
- Add monitoring

Strategy comparison

Criteria	Range	RoundRobin	Sticky	CooperativeSticky
Uniformity	⚠️	✅	✅	✅
Minimize movement	❌	❌	✅	✅
Downtime on rebalance	Full	Full	Full	Partial
Kafka version	All	All	2.2+	2.3+
Server-Side Assignment (KIP-848)	3.0+	Eliminates SyncGroup round-trip, faster rebalance

Common mistakes

Mistake	Consequence	Solution
Frequent rebalances	Constant system downtime	Increase timeouts, static membership
No ConsumerRebalanceListener	Offset loss on rebalance	Always use listener
Uneven distribution	One consumer is overloaded	CooperativeSticky assignor
Range assignor in production	Unevenness + full downtime	Switch to CooperativeSticky

🔴 Senior Level

Group Coordinator Protocol — full diagram

Phase 1: JoinGroup
┌──────────────┐          ┌──────────────────┐          ┌──────────────┐
│  Consumer A  │          │ Group Coordinator│          │  Consumer B  │
└──────┬───────┘          └────────┬─────────┘          └──────┬───────┘
       │ JoinGroup Request          │                          │
       │ (group.id, memberId)       │                          │
       │───────────────────────────►│                          │
       │                            │ JoinGroup Request        │
       │                            │◄─────────────────────────│
       │                            │                          │
       │ Leader chosen (first       │                          │
       │ in member list)            │                          │
       │                            │                          │
       │◄───────────────────────────│ JoinGroup Response       │
       │ (leaderId, allMembers)     │ (memberId, leaderId)     │
       │                            │─────────────────────────►│

Phase 2: Assignment (on Leader)
       │                            │                          │
       │ Leader computes            │                          │
       │ assignment via             │                          │
       │ PartitionAssignor          │                          │
       │ assignment = {             │                          │
       │   A: [P0, P1],             │                          │
       │   B: [P2]                  │                          │
       │ }                          │                          │

Phase 3: SyncGroup
       │                            │                          │
       │ SyncGroup Request          │                          │
       │ (assignment for all)       │                          │
       │───────────────────────────►│                          │
       │                            │                          │
       │                            │◄─────────────────────────│ SyncGroup Response
       │                            │                          │ (own assignment)
       │◄───────────────────────────│                          │
       │ SyncGroup Response         │                          │
       │ (assignment: P0, P1)       │                          │

Kafka 3.0+ (KIP-848): Assignment is computed on Coordinator, not Leader. Eliminates SyncGroup round-trip.

Eager vs Cooperative Rebalancing — internals

Eager Rebalancing:

// org.apache.kafka.clients.consumer.internals.AbstractCoordinator
// onJoinComplete:
//   1. onPartitionsRevoked(ALL partitions)  // Stop EVERYTHING
//   2. newAssignment = computeAssignment()
//   3. onPartitionsAssigned(newAssignment)  // Start NEW
// Full stop = full stop

Cooperative Rebalancing:

// CooperativeStickyAssignor:
//   1. onPartitionsRevoked(SUBSET partitions)  // Only those leaving
//   2. onPartitionsAssigned(SUBSET partitions)  // Only new ones
//   3. Continue working with remaining partitions

// Algorithm:
//   currentAssignment = {P0, P1, P2}
//   newAssignment = {P0, P3}  // P1 and P2 left, P3 added
//   revoked = {P1, P2}         // Only these
//   assigned = {P3}            // Only these
//   continue processing P0     // Don't stop!

Incremental Cooperative Rebalancing:

With many changes:
  Round 1: revoke P1, assign P3
  Round 2: revoke P2, assign P4
  ...
  Each round — short (usually < 1 second)
  Processing continues on remaining partitions

Static Membership — how it works internally

props.put("group.instance.id", "consumer-1");

Protocol:

Dynamic consumer (no group.instance.id):
  Join → assigned memberId = UUID → restart = new memberId → rebalance

Static consumer (with group.instance.id):
  Join → assigned memberId = group.instance.id
  → restart = same memberId → NO rebalance (within session.timeout)
  → Coordinator waits session.timeout.ms before exclusion

Conditions for no rebalance:

Consumer returns within session.timeout.ms
group.instance.id is not used by another consumer
Partition count hasn’t changed

Duplicate group.instance.id:

If two consumers have the same group.instance.id:
  Second one gets FencedInstanceIdException
  → Second consumer terminates
  → This prevents duplicate processing

Production Configuration

# Optimal production configuration
group.id: order-processors
group.instance.id: consumer-${HOSTNAME}     # Static membership
partition.assignment.strategy: cooperative-sticky
session.timeout.ms: 30000                    # 30 seconds
heartbeat.interval.ms: 10000                 # 1/3 of session timeout
max.poll.interval.ms: 300000                 # 5 minutes
max.poll.records: 500                        # batch for processing
enable.auto.commit: false                    # Manual commit

Edge Cases (3+)

Cascading Rebalance (Thundering Herd): One consumer crashes → rebalance → remaining consumers get more partitions → load increases → another consumer crashes from overload → another rebalance → avalanche. Solution: Increase session.timeout.ms, use max.poll.records for load control, set up monitoring alerts on consumer lag growth rate.
Rebalance during Deployment: With rolling deploy of N instances and eager rebalancing, N rebalances occur. Each rebalance = 5-30 seconds downtime. For 20 instances = 100-600 seconds downtime. Solution: Static membership + cooperative rebalancing = 0 rebalances during rolling deploy.
Partition Stall after Rebalance: After rebalance, a new consumer gets a partition and starts from the committed offset. If processing is stateful (e.g., windowed aggregation), the new consumer has no local state → incorrect results until full recalculation. Solution: External state store (RocksDB, Redis) keyed by partition; or state replay on assignment.
Cross-Rack Rebalance: In a multi-rack cluster, when assigning partitions to a consumer in a different rack, latency increases (network round-trip). Range/RoundRobin don’t account for rack-awareness. Solution: Custom PartitionAssignor with rack-aware assignment; or Consumer Sidecar in the same rack as brokers.
Rebalance Storm with K8s HPA: Horizontal Pod Autoscaler adds pods on CPU increase → new consumers trigger rebalance → processing slows down (rebalance overhead) → CPU rises even more → HPA adds more pods → infinite loop. Solution: Don’t use HPA for Kafka consumers. Instead — custom metrics (consumer lag) with thresholds and cooldown period.

Performance Numbers

Metric	Eager	Cooperative
Rebalance latency	5-30 seconds	1-5 seconds
Processing during rebalance	Full stop	Continues (partial)
Partition movement	All	Incremental
Impact on consumer lag	High (2-10x spike)	Low (1.2-2x spike)
Number of rebalances per rolling deploy	N (per instance)	0 (with static membership)

Production War Story

Situation: Streaming service with 25 consumers in group event-processors (25 partitions). Range assignor, dynamic membership. Every night at 02:00 — automatic rolling deploy (container update).

Problem: 25 sequential rebalances × 15 seconds = 375 seconds (6+ minutes) of downtime every night. During this time, lag grew to 500K events. Morning users saw “lagging” recommendations.

Additional problem: Range assignor with 25 partitions and 24 consumers (one on rebalance) assigned 2 partitions to one consumer and 1 to the rest. The overloaded consumer crashed → another rebalance → another crash → cascading failure.

Diagnosis:
kafka-consumer-groups.sh --describe --group event-processors
# STATE=Rebalancing → STATE=Stable → STATE=Rebalancing (cycle)
# Lag: 0 → 500K → 0 (within 15 minutes after deploy)
Solution:

CooperativeStickyAssignor — rebalance became 2 seconds instead of 15

group.instance.id=${POD_NAME} — rolling deploy = 0 rebalances

session.timeout.ms=45000, heartbeat.interval.ms=15000 — timeout margin

ConsumerRebalanceListener with commitSync + state flush

K8s PodDisruptionBudget: maxUnavailable=1

Lag-based HPA instead of CPU-based (custom metric via Prometheus Adapter)

Result: Rolling deploy — 0 rebalances, 0 downtime, lag < 1K.

Lesson: Eager rebalancing + dynamic membership + rolling deploy = guaranteed cascading failure. Cooperative + static membership = zero-downtime deploys.

Monitoring (JMX + Burrow)

JMX metrics:

kafka.consumer:type=consumer-coordinator-metrics,client-id=*,key=rebalance-rate-avg
kafka.consumer:type=consumer-coordinator-metrics,client-id=*,key=last-rebalance-seconds-ago
kafka.consumer:type=consumer-coordinator-metrics,client-id=*,key=assigned-partitions
kafka.consumer:type=consumer-coordinator-metrics,client-id=*,key=failed-rebalance-total
kafka.consumer:type=consumer-coordinator-metrics,client-id=*,key=last-heartbeat-seconds-ago

Burrow:

Group status: OK, WARN, ERR, STOP, STALL
Per-partition lag with trend
HTTP API → Grafana → Alertmanager

Alert rules:

- alert: KafkaRebalanceTooFrequent
  expr: rate(kafka_consumer_coordinator_rebalance_rate[5m]) > 0.1
  for: 5m
  severity: warning

- alert: KafkaConsumerStalled
  expr: kafka_consumer_records_lag_max > 100000
  for: 10m
  severity: critical

Highload Best Practices

CooperativeStickyAssignor — standard for production (Kafka 2.3+)
Static membership (group.instance.id) — eliminates rebalance on rolling deploy
Heartbeat tuning: heartbeat.interval.ms = session.timeout.ms / 3
ConsumerRebalanceListener — commitSync + state cleanup in onPartitionsRevoked
Monitor rebalance frequency — alert on > 1 rebalance per 10 minutes
Tune max.poll.records to latency — goal: processing < max.poll.interval.ms
Don’t use CPU-based HPA — use custom metric (consumer lag)
PodDisruptionBudget — maxUnavailable=1 to prevent cascading rebalances
Cross-rack awareness — custom assignor for multi-rack clusters
Stateful processing — external state store (RocksDB, Redis) keyed by partition

🎯 Interview Cheat Sheet

Must know:

Balancing = automatic distribution of partitions among consumers in a group
Strategies: Range (uneven), RoundRobin, Sticky, CooperativeSticky (recommended)
Cooperative rebalancing: step-by-step movement, processing continues on remaining partitions
ConsumerRebalanceListener is mandatory: commitSync in onPartitionsRevoked
Static membership (group.instance.id) — 0 rebalances on rolling deploy
Chaos rebalance: one crashes → load increases → another crashes → avalanche
Heartbeat tuning: heartbeat.interval.ms = session.timeout.ms / 3

Common follow-up questions:

What is cascading rebalance? — One consumer crashes → load on others → another crashes → avalanche.
Why is CPU-based HPA bad for Kafka? — Rebalance slows processing → CPU rises → more pods → infinite loop.
What does ConsumerRebalanceListener do? — Commits offsets and cleans state before losing partitions.
How to avoid rebalance on deploy? — Static membership + CooperativeStickyAssignor.

Red flags (DO NOT say):

“Range assignor is standard for production” — uneven, use CooperativeSticky
“More consumers = always more throughput” — limited by partition count
“Rebalancing is instant” — 5-30 seconds (eager), 1-5 (cooperative)
“You can ignore session.timeout.ms” — too short = false rebalances

Related topics:

[[5. What is Consumer Group]]
[[8. What happens when you add a new consumer to the group]]
[[15. What is rebalancing and when does it happen]]
[[7. Can you have more consumers than partitions]]