Question 1 · Section 15

What is topic in Kafka

Think of TV channels:

Language versions: English Russian Ukrainian

🟢 Junior Level

What is a topic?

Topic is a named channel (logical category) into which producers send messages and consumers read them. It is the primary abstraction for organizing data in Kafka.

How a topic differs from a queue: In a queue, message read = deleted. In a topic, the message is stored for a specified time (retention), and it can be read by any number of independent consumer groups. This is the key difference from traditional message brokers (RabbitMQ).

Analogy

Think of TV channels:

  • Topic “orders” — a channel about orders
  • Topic “users” — a channel about users
  • Topic “notifications” — a channel about notifications

Each producer “broadcasts” into its own channel, each consumer “tunes the receiver” to the desired channel.

Important difference from TV channels: in Kafka, the same message can be read by multiple independent groups, each at its own speed and from its own position.

Simple example

Producer → Topic "orders" → Consumer
Producer → Topic "users"   → Consumer
// Producer sends to topic
producer.send(new ProducerRecord<>("orders", "order-data"));

// Consumer reads from topic
consumer.subscribe(List.of("orders"));

Key properties

  • A topic is created with a specified number of partitions (sub-channels for parallelism) and replication factor (how many copies to store on different servers for fault tolerance).
  • Messages are stored for a defined time (retention period).
  • A topic can be created, modified, and deleted.
kafka-topics.sh --create \
  --topic orders \
  --partitions 3 \
  --replication-factor 3 \
  --bootstrap-server localhost:9092

When NOT to use Kafka topics

  1. Request-response patterns — HTTP/gRPC is better
  2. Guaranteed delivery to a single recipient — RabbitMQ is better
  3. Data needs frequent updates/modifications — a database is better

🟡 Middle Level

Logical and physical structure

Logically: a topic is a category for messages.

Physically: a topic consists of partitions distributed across brokers. Each partition is a set of files on disk.

Topic "orders"
├── Partition 0 → Broker A
├── Partition 1 → Broker B
└── Partition 2 → Broker C

Cleanup Policies

Two main cleanup policies:

Policy Description Use Case
delete (default) Messages are deleted after retention expires Events, logs, orders
compact Keeps only the latest value for each key User profiles, current prices

Partition file structure

/partition-0/
├── 00000000000000000000.log       # data
├── 00000000000000000000.index     # offset index
└── 00000000000000000000.timeindex # time index

Ordering guarantees

  • Within a partition: strict ordering (FIFO)
  • Within a topic: no ordering guarantees (if partitions > 1)

Topic configurations

Parameter Description Example
cleanup.policy delete or compact delete
retention.ms storage duration 604800000 (7 days)
min.insync.replicas minimum replicas for write 2
compression.type compression type lz4, snappy, zstd

Common mistakes

Mistake Consequence Solution
Too few partitions Limits consumer parallelism Plan ahead
No retention configured Disk will fill up Set retention.ms
One topic for all data Hard to scale Split by domain

🔴 Senior Level

Topic as a distributed append-only log

A topic is an immutable distributed append-only log, split into partitions for scalability and fault tolerance. At the broker level, each topic is managed via LogManager, which stores a collection of Log objects (one per partition).

Log Compaction — deep dive

How Log Cleaner works:

  1. Background thread LogCleaner scans segments (threshold: min.cleanable.dirty.ratio=0.5)
  2. Builds a hash table offset → key for all records
  3. Rewrites segments, keeping only the latest values for each key
  4. Tombstone records (value = null) mark keys for complete deletion

Performance:

  • log.cleaner.threads=1 (default) — sufficient for most cases
  • log.cleaner.io.buffer.size=512KB — read/write buffer
  • log.cleaner.backoff.ms=15000 — pause between cycles

Use Cases:

  • Storing user profile (Changelog pattern)
  • Current aggregate state (Event Sourcing)
  • State recovery in Kafka Streams (changelog topics)

Segment architecture

Active segment → writes (sealed when segment.bytes or segment.ms reached)
Closed segments → read-only, candidates for deletion/compaction

segment.bytes=1GB
segment.ms=7 days
segment.index.bytes=10MB

Benefits of segmentation:

  • O(1) deletion — entire segments are deleted
  • Efficient OS Page Cache utilization
  • Parallel cleanup (different threads clean different segments)
  • Index-based navigation — binary search on .index files

Senior-level configurations

Parameter Impact Recommendation
min.insync.replicas Critical for durability (paired with acks=all) 2 with RF=3
unclean.leader.election.enable Availability (true) vs Durability (false) false for critical data
message.format.version Mismatch → conversion overhead Single version across cluster
max.message.bytes Message size limit 1048588 (1MB default)

Edge Cases

  1. Topic with 0 partitions — impossible, Kafka requires at least 1 partition at creation
  2. Reducing partitions — impossible without deleting the topic (alter --partitions only increases)
  3. Topic with RF > number of brokers — creation error (Replication factor: 3 larger than available brokers: 2)
  4. Topic name with dot (.) or underscore (_) — in older versions this caused conflicts with internal topics (__consumer_offsets)
  5. Concurrent writes during ISR rebalance — possible message loss if acks=1 and leader fails before replication

Performance Numbers

Metric Value Conditions
Throughput per topic ~500 MB/s 12 partitions, lz4, batch.size=256KB
Latency (p99) 5-15 ms acks=all, RF=3, local data center
Max partitions/broker ~200,000 Depends on OS file descriptors
Optimal partitions/broker 2,000-4,000 For stable controller operation

Production War Story

Situation: A fintech company used a single topic events for all event types (orders, payments, logins). The topic had 6 partitions. When load grew to 50K msg/s, consumers couldn’t keep up — lag grew exponentially.

Problem: When trying to increase partitions to 60, order keys no longer landed in the same partitions — payment processing order was broken. Clients saw duplicate charges.

Solution: Split into 4 domain-specific topics (orders, payments, auth, audit), each with 20 partitions. Migrated via dual-write with a 48-hour “shadow” period. Added min.insync.replicas=2 and acks=all for the payment topic.

Lesson: Split topics by business domain from the very beginning. Plan partitions with a 10x margin.

Monitoring (JMX metrics)

kafka.server:name=BytesInPerSec,type=BrokerTopicMetrics
kafka.server:name=BytesOutPerSec,type=BrokerTopicMetrics
kafka.server:name=MessagesInPerSec,type=BrokerTopicMetrics
kafka.cluster:type=Partition,name=UnderReplicated
kafka.log:type=Log,name=LogEndOffset
kafka.log:type=Log,name=LogStartOffset

Burrow (LinkedIn):

  • Evaluates consumer lag and status (OK, WARN, ERR)
  • HTTP API for Alertmanager/PagerDuty integration
  • Requires no consumer modifications

Highload Best Practices

  1. Plan partitions: partitions = max(producer_throughput, consumer_throughput) * 1.5
  2. Use min.insync.replicas=2 with RF=3 for critical data
  3. Split topics by business domain — simplifies scaling and monitoring
  4. Compact for state, Delete for events — right policy for the task
  5. Compression: lz4 for throughput, zstd for storage savings
  6. Batch tuning: batch.size=256KB, linger.ms=10 for high-throughput producers
  7. Monitor Under-Replicated Partitions — first indicator of cluster problems
  8. Use KRaft (Kafka Raft, production-ready since Kafka 3.3+) instead of ZooKeeper for new clusters. In Kafka 2.x KRaft is experimental.

🎯 Interview Cheat Sheet

Must know:

  • Topic — logical category for messages, physically consists of partitions
  • Unlike a queue, messages are not deleted after reading — stored for a defined time (retention)
  • Cleanup policies: delete (by time) and compact (latest value per key)
  • Ordering is guaranteed only within a partition, not within a topic
  • Topic is an immutable append-only log, split into partitions
  • min.insync.replicas=2 with RF=3 is critical for durability
  • Reducing partitions is impossible — only topic recreation
  • Split topics by business domain from the very beginning

Common follow-up questions:

  • Can you reduce the number of partitions? — No, only increase. Reduction requires recreating the topic.
  • What is log compaction? — Keeps only the latest value for each key, deletes old ones.
  • What happens if a topic has 0 partitions? — Impossible, Kafka requires at least 1 partition.
  • How to ensure global ordering? — Only 1 partition, but that kills parallelism.

Red flags (DO NOT say):

  • “Topic is a queue, messages are deleted after reading” — it’s not a queue
  • “Ordering is guaranteed across the whole topic” — only within a partition
  • “You can reduce partitions via alter” — you can’t
  • “One topic for all events is fine” — anti-pattern

Related topics:

  • [[2. What is partition and why is it needed]]
  • [[3. How is data distributed across partitions]]
  • [[27. What is retention policy]]
  • [[28. How are old messages deleted from a topic]]
  • [[16. What is replication in Kafka]]