Question 23 · Section 15

What is idempotent producer

4. Broker-side validation — duplicate protection on the broker side

Language versions: English Russian Ukrainian

Junior Level

Definition

Idempotent producer — a producer that guarantees that duplicates will not enter Kafka on retry.

props.put("enable.idempotence", "true");
// Automatically sets:
// acks=all
// retries=Integer.MAX_VALUE
// max.in.flight.requests.per.connection=5

Why is it needed?

Without idempotence:
  Producer sent → network error → retry → duplicate in Kafka

With idempotence:
  Producer sent → network error → retry → broker rejects duplicate

How does it work?

Each producer gets a unique PID (Producer ID)
Each message gets a Sequence Number
The broker tracks the sequence

Duplicate with the same PID + Sequence → rejected

Middle Level

When NOT to use idempotent producer

  1. Compatibility with old brokers (pre-0.11) — idempotence is not supported
  2. Ultra-low latency with tolerable duplicates — can be disabled for minimal overhead

Internal Mechanism

PID (Producer ID) — unique producer ID
Sequence Number — increments for each message
Broker checks the sequence:
  - If sequence = expected → writes
  - If sequence < expected → duplicate, rejects
  - If sequence > expected → out of order, error

Automatic Settings

enable.idempotence=true automatically sets:
  acks=all
  retries=Integer.MAX_VALUE
  max.in.flight.requests.per.connection=5 (starting from Kafka 1.1, previously limited to 1).

No need to configure manually!

Transactional Producer

// For exactly-once semantics
props.put("transactional.id", "my-tx-id");
props.put("enable.idempotence", "true");

producer.initTransactions();
producer.beginTransaction();
producer.send(record);
producer.commitTransaction();

Common Mistakes

  1. Without idempotence when using retries:
    Retry → duplicates in the topic
    → Processed twice
    
  2. Manual PID modification:
    PID is generated automatically
    Manual changes will break the mechanism
    
  3. enable.idempotence=false with retries:
    Retries enabled, idempotence disabled
    → Duplicates on retry
    

Senior Level

Internal Implementation

PID Assignment:

When creating a producer:
1. Producer → InitProducerId request → Broker
2. Broker → generates unique PID
3. Broker → returns PID + epoch
4. Producer → uses PID for all messages

Sequence Number Management:

Each producer-partition pair has its own sequence number:
  Producer P1, Partition 0 → seq=0
  Producer P1, Partition 0 → seq=1
  Producer P1, Partition 1 → seq=0 (different partition)

Sequence number increments per message

Broker Validation:

The broker checks:
1. PID matches the current producer
2. Sequence number = expected (not less)
3. If sequence < expected → duplicate, reject
4. If sequence > expected → out of order, error

Exactly-Once Semantics

Idempotent producer is the foundation of exactly-once:
1. Producer: enable.idempotence=true
2. Producer: transactional.id (for transactions)
3. Consumer: isolation.level=read_committed

Only for Kafka-to-Kafka scenarios!

Failure Scenarios

1. Producer Restart:

Producer restarts → new PID
Old sequence numbers don't affect
New producer starts with sequence=0

2. Broker Failover:

Leader goes down → new leader
New leader knows the last sequence number
Continues validation from the same point

3. Network Partition:

Producer → sent → network error → retry
Broker received → wrote → ack lost
Producer → retry → broker sees duplicate → rejects

Performance Impact

Idempotent producer overhead:
  ~5-10% latency increase
  ~5% throughput decrease
  Minimal CPU overhead

Trade-off: reliability vs performance

Monitoring

Key metrics:

kafka.producer:produce-throttle-time-avg
kafka.producer:failed-authentication-rate
kafka.producer:idempotent-rate (if available)

Best Practices

✅ enable.idempotence=true by default (Kafka 3.0+); for older versions, set it explicitly.
✅ For exactly-once — add transactional.id
✅ Without changing default settings
✅ Monitor failed sends

❌ Without idempotence when using retries
❌ Manual PID modification
❌ enable.idempotence=false with retries
❌ Without handling send errors

Architectural Decisions

  1. Idempotence by default — minimal overhead, maximum reliability
  2. Transactional ID for exactly-once — Kafka-to-Kafka scenarios
  3. Sequence numbers per partition — independent validation
  4. Broker-side validation — duplicate protection on the broker side

Summary for Senior

  • Idempotent producer prevents duplicates on retry
  • PID + Sequence Number — deduplication mechanism
  • Automatically sets acks=all and retries=INT_MAX
  • Exactly-once requires transactional.id
  • Minimal performance overhead, maximum reliability benefit

🎯 Interview Cheat Sheet

Must know:

  • Idempotent producer guarantees: duplicates will not enter Kafka on retry
  • PID (Producer ID) + Sequence Number — broker-side deduplication mechanism
  • enable.idempotence=true automatically sets: acks=all, retries=INT_MAX, max.in.flight=5
  • Broker checks sequence: if < expected → duplicate reject; if > → out of order error
  • Sequence number per partition — independent validation for each partition
  • For exactly-once: add transactional.id + Transaction API
  • Overhead: ~5-10% latency increase, ~5% throughput decrease — minimal

Common follow-up questions:

  • What happens without idempotence on retries? — Retry → duplicates in the topic → double processing.
  • Is PID generated manually? — No, automatically by the broker when creating the producer.
  • How does the broker handle failover? — The new leader knows the last sequence number and continues validation.
  • enable.idempotence=false with retries — what happens? — Retries enabled, duplicates are possible.

Red flags (DO NOT say):

  • “Idempotent producer can be disabled for production” — retries without idempotence = duplicates
  • “PID can be configured manually” — generated automatically by the broker
  • “Idempotence protects against duplicates between different producers” — only within one producer
  • “Exactly-once works without transactional.id” — transactional.id is needed for transactions

Related topics:

  • [[11. How to configure exactly-once semantics]]
  • [[9. What delivery guarantees does Kafka provide]]
  • [[20. What is producer acknowledgment and what modes exist (acks=0,1,all)]]
  • [[21. What is batch in Kafka producer]]