What is offset in Kafka
Offset is NOT a unique message ID. It's a position in the log. The same message in different partitions has different offsets. Offset grows monotonically (0, 1, 2...) and never...
Junior Level
Definition
Offset is the sequential number of a message within a partition.
Offset is NOT a unique message ID. It’s a position in the log. The same message in different partitions has different offsets. Offset grows monotonically (0, 1, 2…) and never decreases, even if messages are deleted.
Each message gets a unique offset.
Partition 0:
offset 0: {"user": "user-1", "action": "login"}
offset 1: {"user": "user-2", "action": "purchase"}
offset 2: {"user": "user-1", "action": "logout"}
offset 3: {"user": "user-3", "action": "login"}
Key properties
- Offset is a number, starting from 0
- Offset is unique only within one partition
- Offset cannot be changed after writing
- Messages are ordered by offset
How does a consumer use offset?
Consumer remembers its offset:
Read offset 5 → remembered 5
On restart → continues from offset 6
Code example
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
System.out.println("Partition: " + record.partition());
System.out.println("Offset: " + record.offset());
System.out.println("Value: " + record.value());
}
Middle Level
Committed Offset
Committed offset — the last processed offset saved in Kafka
Consumer reads offset 5 → processes → commits offset 5
On restart → continues from offset 6
Consumer Position
// Current position (next offset to read)
long position = consumer.position(new TopicPartition("orders", 0));
// Last committed offset
OffsetAndMetadata committed = consumer.committed(new TopicPartition("orders", 0));
Offset Management
// Manual commit
props.put("enable.auto.commit", "false");
// Synchronous commit (reliable)
consumer.commitSync();
// Asynchronous commit (fast)
consumer.commitAsync((offsets, exception) -> {
if (exception != null) {
log.error("Commit failed", exception);
}
});
Seek — reading from a specific offset
TopicPartition partition = new TopicPartition("orders", 0);
// Read from specific offset
consumer.seek(partition, 1000);
// Read from beginning
consumer.seekToBeginning(List.of(partition));
// Read from end
consumer.seekToEnd(List.of(partition));
Common mistakes
- Commit before processing:
consumer.commitSync(); // ❌ commit first process(records); // then process // If crashes → data lost - Manual offset change without reason:
consumer.seek(partition, 0); // read from start // Can cause duplicates
Senior Level
Internal Implementation
__consumer_offsets topic:
Internal Kafka topic for storing offsets
- Compacted (log compaction)
- Stores offsets for all consumer groups
- Key: group.id + topic + partition
- Value: offset + metadata + timestamp
Offset Storage Format:
Key: [group.id, topic, partition]
Value: {
offset: long,
metadata: String,
leaderEpoch: int,
timestamp: long
}
Offset Commit Strategies
1. Synchronous Commit:
// Blocks until broker confirms
consumer.commitSync();
// Reliable, but slower
2. Asynchronous Commit:
// Doesn't wait for confirmation
consumer.commitAsync((offsets, exception) -> {
if (exception != null) {
// Retry or log
log.error("Commit failed for offsets: {}", offsets, exception);
}
});
// Faster, but possible loss on failure
3. Batch Commit:
// Commit every N messages
int count = 0;
for (var record : records) {
process(record);
if (++count % 100 == 0) {
consumer.commitAsync();
}
}
Offset Reset Policy
// auto.offset.reset — behavior when no committed offset exists
props.put("auto.offset.reset", "earliest"); // read from start
props.put("auto.offset.reset", "latest"); // read from end (default)
props.put("auto.offset.reset", "none"); // exception
Leader Epoch and Offset Validation
Starting from Kafka 0.11:
- Leader Epoch stored with each offset
- On failover, consumer can determine
which messages may have been lost
- Prevents silent data loss
Leader Epoch — the partition leader generation number. Each time the leader changes, the epoch
increases. This allows the consumer to determine which messages may have been lost
during failover.
Offset Monitoring
Key metrics:
Current Offset → last read
Committed Offset → last committed
Log End Offset → last in partition
Consumer Lag = Log End Offset - Committed Offset
Monitoring lag:
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--describe --group my-group
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
my-group orders 0 1000 1200 200
LAG=200 means: 200 messages in the partition are not yet read. Criticality depends on throughput: at 100 msg/s — 2 seconds behind (normal), at 1 msg/s — 200 seconds (critical).
Offset Recovery After Failure
Scenario:
1. Consumer read offset 100
2. Started processing → crashed
3. Didn't commit offset
On restart:
- Committed offset = 99
- Continues from offset 100
- Message 100 will be processed again (at-least-once)
Best Practices
✅ Commit after processing
✅ Synchronous commit for reliability
✅ Asynchronous commit for high-throughput
✅ Monitor lag
✅ Configure auto.offset.reset
❌ Auto commit for critical data
❌ Commit before processing
❌ Manual offset change without reason
❌ Ignoring consumer lag
❌ Without offset monitoring
Architectural decisions
- Committed offset — foundation of at-least-once — guarantees delivery
- __consumer_offsets — compacted internal topic — efficient storage
- Leader epoch validation — prevents silent data loss
- Monitoring lag — early problem indicator
Summary for Senior
- Offset — fundamental progress tracking mechanism
- __consumer_offsets stores offsets for all groups
- Commit strategy affects reliability and performance
- Leader epoch validation is critical for data safety
- Monitoring lag is mandatory for production
🎯 Interview Cheat Sheet
Must know:
- Offset — sequential message number within a partition, starts from 0
- Offset is unique only within one partition, not globally
- Offset cannot be changed after writing; messages ordered by offset
- Committed offset — last processed offset saved in
__consumer_offsets - Consumer Lag = LogEndOffset - CommittedOffset — key health metric
auto.offset.reset: earliest (from start), latest (from end, default), none (exception)- Leader Epoch (since Kafka 0.11) — prevents silent data loss on failover
- Seek allows reading from any offset: start, end, specific number, timestamp
Common follow-up questions:
- What is Leader Epoch? — Leader generation number; helps determine lost messages on failover.
- Where are committed offsets stored? — In the internal topic
__consumer_offsets(compact). - What does LAG=200 mean? — 200 messages not yet read; criticality depends on throughput.
- What happens if you commit before processing? — On crash, messages are lost (offset committed, not processed).
Red flags (DO NOT say):
- “Offset is a unique message ID” — it’s a position in the log, not an ID
- “Offset can be changed” — cannot, immutable
- “Offset is unique across the whole topic” — only within a partition
- “Commit before processing is fine” — that’s data loss on crash
Related topics:
- [[13. How does offset commit work]]
- [[14. What is the difference between auto commit and manual commit]]
- [[26. How to monitor consumer lag]]
- [[15. What is rebalancing and when does it happen]]