What is idempotent producer
4. Broker-side validation — duplicate protection on the broker side
Junior Level
Definition
Idempotent producer — a producer that guarantees that duplicates will not enter Kafka on retry.
props.put("enable.idempotence", "true");
// Automatically sets:
// acks=all
// retries=Integer.MAX_VALUE
// max.in.flight.requests.per.connection=5
Why is it needed?
Without idempotence:
Producer sent → network error → retry → duplicate in Kafka
With idempotence:
Producer sent → network error → retry → broker rejects duplicate
How does it work?
Each producer gets a unique PID (Producer ID)
Each message gets a Sequence Number
The broker tracks the sequence
Duplicate with the same PID + Sequence → rejected
Middle Level
When NOT to use idempotent producer
- Compatibility with old brokers (pre-0.11) — idempotence is not supported
- Ultra-low latency with tolerable duplicates — can be disabled for minimal overhead
Internal Mechanism
PID (Producer ID) — unique producer ID
Sequence Number — increments for each message
Broker checks the sequence:
- If sequence = expected → writes
- If sequence < expected → duplicate, rejects
- If sequence > expected → out of order, error
Automatic Settings
enable.idempotence=true automatically sets:
acks=all
retries=Integer.MAX_VALUE
max.in.flight.requests.per.connection=5 (starting from Kafka 1.1, previously limited to 1).
No need to configure manually!
Transactional Producer
// For exactly-once semantics
props.put("transactional.id", "my-tx-id");
props.put("enable.idempotence", "true");
producer.initTransactions();
producer.beginTransaction();
producer.send(record);
producer.commitTransaction();
Common Mistakes
- Without idempotence when using retries:
Retry → duplicates in the topic → Processed twice - Manual PID modification:
PID is generated automatically Manual changes will break the mechanism - enable.idempotence=false with retries:
Retries enabled, idempotence disabled → Duplicates on retry
Senior Level
Internal Implementation
PID Assignment:
When creating a producer:
1. Producer → InitProducerId request → Broker
2. Broker → generates unique PID
3. Broker → returns PID + epoch
4. Producer → uses PID for all messages
Sequence Number Management:
Each producer-partition pair has its own sequence number:
Producer P1, Partition 0 → seq=0
Producer P1, Partition 0 → seq=1
Producer P1, Partition 1 → seq=0 (different partition)
Sequence number increments per message
Broker Validation:
The broker checks:
1. PID matches the current producer
2. Sequence number = expected (not less)
3. If sequence < expected → duplicate, reject
4. If sequence > expected → out of order, error
Exactly-Once Semantics
Idempotent producer is the foundation of exactly-once:
1. Producer: enable.idempotence=true
2. Producer: transactional.id (for transactions)
3. Consumer: isolation.level=read_committed
Only for Kafka-to-Kafka scenarios!
Failure Scenarios
1. Producer Restart:
Producer restarts → new PID
Old sequence numbers don't affect
New producer starts with sequence=0
2. Broker Failover:
Leader goes down → new leader
New leader knows the last sequence number
Continues validation from the same point
3. Network Partition:
Producer → sent → network error → retry
Broker received → wrote → ack lost
Producer → retry → broker sees duplicate → rejects
Performance Impact
Idempotent producer overhead:
~5-10% latency increase
~5% throughput decrease
Minimal CPU overhead
Trade-off: reliability vs performance
Monitoring
Key metrics:
kafka.producer:produce-throttle-time-avg
kafka.producer:failed-authentication-rate
kafka.producer:idempotent-rate (if available)
Best Practices
✅ enable.idempotence=true by default (Kafka 3.0+); for older versions, set it explicitly.
✅ For exactly-once — add transactional.id
✅ Without changing default settings
✅ Monitor failed sends
❌ Without idempotence when using retries
❌ Manual PID modification
❌ enable.idempotence=false with retries
❌ Without handling send errors
Architectural Decisions
- Idempotence by default — minimal overhead, maximum reliability
- Transactional ID for exactly-once — Kafka-to-Kafka scenarios
- Sequence numbers per partition — independent validation
- Broker-side validation — duplicate protection on the broker side
Summary for Senior
- Idempotent producer prevents duplicates on retry
- PID + Sequence Number — deduplication mechanism
- Automatically sets acks=all and retries=INT_MAX
- Exactly-once requires transactional.id
- Minimal performance overhead, maximum reliability benefit
🎯 Interview Cheat Sheet
Must know:
- Idempotent producer guarantees: duplicates will not enter Kafka on retry
- PID (Producer ID) + Sequence Number — broker-side deduplication mechanism
enable.idempotence=trueautomatically sets: acks=all, retries=INT_MAX, max.in.flight=5- Broker checks sequence: if < expected → duplicate reject; if > → out of order error
- Sequence number per partition — independent validation for each partition
- For exactly-once: add
transactional.id+ Transaction API - Overhead: ~5-10% latency increase, ~5% throughput decrease — minimal
Common follow-up questions:
- What happens without idempotence on retries? — Retry → duplicates in the topic → double processing.
- Is PID generated manually? — No, automatically by the broker when creating the producer.
- How does the broker handle failover? — The new leader knows the last sequence number and continues validation.
- enable.idempotence=false with retries — what happens? — Retries enabled, duplicates are possible.
Red flags (DO NOT say):
- “Idempotent producer can be disabled for production” — retries without idempotence = duplicates
- “PID can be configured manually” — generated automatically by the broker
- “Idempotence protects against duplicates between different producers” — only within one producer
- “Exactly-once works without transactional.id” — transactional.id is needed for transactions
Related topics:
- [[11. How to configure exactly-once semantics]]
- [[9. What delivery guarantees does Kafka provide]]
- [[20. What is producer acknowledgment and what modes exist (acks=0,1,all)]]
- [[21. What is batch in Kafka producer]]