Skip to content
Maintained by AxonOps — production-grade documentation from engineers who operate distributed databases at scale

Kafka Broker Architecture

This section covers the internal architecture of a Kafka broker—the server process that stores messages and serves client requests. Understanding broker internals is essential for capacity planning, performance tuning, and troubleshooting.

A Kafka broker handles a portion of the cluster's data, enabling horizontal scaling and fault tolerance.


What a Broker Does

uml diagram

Responsibility Description
Store messages Persist records to disk as log segments
Serve producers Accept writes, acknowledge based on acks setting
Serve consumers Return records from requested offsets
Replicate data Send records to follower replicas
Receive replicas Accept records from leader replicas
Report metadata Register with controller, report partition state

Broker Identity

Each broker has a unique identity within the cluster:

# Unique broker ID (must be unique across cluster)
node.id=1

# Listeners for client and inter-broker communication
listeners=PLAINTEXT://:9092,CONTROLLER://:9093
advertised.listeners=PLAINTEXT://broker1.example.com:9092

# Data directory
log.dirs=/var/kafka-logs

Clients discover brokers through the bootstrap servers, then connect directly to the broker hosting each partition's leader.


Partitions and Leadership

Brokers don't own topics—they own partition replicas. Each partition has one leader and zero or more followers:

uml diagram

Role Responsibilities
Leader Handle all produce and fetch requests for the partition
Follower Fetch records from leader, ready to become leader if needed

A single broker typically hosts hundreds or thousands of partition replicas, some as leader, others as follower.

For replication protocol details, ISR management, and leader election, see Replication.


KRaft Mode

Modern Kafka clusters (3.3+) use KRaft (Kafka Raft) for metadata management, eliminating the ZooKeeper dependency.

uml diagram

Process Roles

Role Configuration Description
broker process.roles=broker Handles client requests only
controller process.roles=controller Manages metadata only
combined process.roles=broker,controller Both roles in one process
Deployment Best For Trade-off
Combined Small clusters (≤10 brokers) Simpler, but resource contention
Dedicated Large clusters More servers, but better isolation

For complete KRaft documentation, see KRaft: Kafka Raft Consensus.


Network Layer

The network layer handles all client and inter-broker communication using a reactor pattern with distinct thread pools.

Threading Model

uml diagram

Thread Pool Default Configuration Role
Acceptor 1 per listener Fixed Accept new TCP connections
Network Processors 3 num.network.threads Read requests, write responses (NIO)
Request Handlers 8 num.io.threads Execute request logic

Request Flow

  1. Accept - Acceptor thread accepts TCP connection
  2. Assign - Connection assigned to network processor (round-robin)
  3. Read - Network processor reads request from socket
  4. Queue - Request placed in shared request queue
  5. Handle - Request handler dequeues and executes
  6. Response - Response queued for network processor
  7. Send - Network processor writes response to socket

Thread Pool Sizing

Cluster Size num.network.threads num.io.threads
Small (< 10 brokers) 3 8
Medium (10-50 brokers) 4-6 8-16
Large (50+ brokers) 8+ 16-32

For security configuration, see Authentication and Authorization.


Request Purgatory

The purgatory holds delayed requests waiting for conditions to be satisfied, enabling efficient handling without blocking handler threads.

Delayed Operations

Operation Completion Condition Timeout
DelayedProduce All ISR replicas acknowledged request.timeout.ms
DelayedFetch min.bytes data available fetch.max.wait.ms
DelayedJoin All group members joined rebalance.timeout.ms
DelayedHeartbeat Session timeout check session.timeout.ms

Purgatory Architecture

uml diagram

Produce with acks=all

uml diagram

Timer Wheel

Kafka uses a hierarchical timing wheel for O(1) timeout management:

uml diagram


Coordinators

Brokers host two coordinator components based on internal topic partition assignment.

Group Coordinator

Manages consumer group membership, partition assignment, and offset storage.

coordinator_partition = hash(group.id) % 50
coordinator_broker = leader of __consumer_offsets partition
Function Description
Membership management Track group members via heartbeats
Rebalance coordination Orchestrate JoinGroup/SyncGroup protocol
Offset storage Persist committed offsets to __consumer_offsets

For consumer group protocol and operations, see Consumer Groups.

Transaction Coordinator

Manages exactly-once semantics for transactional producers.

coordinator = hash(transactional.id) % 50
coordinator_broker = leader of __transaction_state partition
Function Description
PID assignment Assign producer IDs and epochs
State persistence Store transaction state in __transaction_state
Commit coordination Write transaction markers to partition leaders

For transaction semantics and protocol, see Transactions.

Internal Topics

Topic Partitions Purpose
__consumer_offsets 50 Consumer group offsets
__transaction_state 50 Transaction coordinator state
__cluster_metadata 1 KRaft metadata log

Startup and Recovery

When a broker starts—especially after a crash—it must recover state before serving requests.

Startup Sequence

uml diagram

Clean vs Unclean Shutdown

Shutdown Type Detection Recovery Behavior
Clean .kafka_cleanshutdown marker Skip log scanning, fast startup
Unclean No marker (crash, kill -9) Full log recovery, validate segments

Log Recovery

On unclean shutdown, each partition's log is validated:

  1. Scan log directory for segment files
  2. Validate segment CRC checksums
  3. Truncate at corruption point if found
  4. Rebuild indexes if invalid
  5. Truncate incomplete records at end of active segment

Index Rebuild Cost

Partition Size Rebuild Time
1 GB 5-15 seconds
10 GB 30-90 seconds
100 GB 5-15 minutes

Many Partitions = Slow Startup

A broker with 1000 partitions requiring index rebuild can take 30+ minutes to start.

Log Truncation

After leader failure, followers may need to truncate divergent entries:

uml diagram

Controlled Shutdown

# Graceful shutdown (recommended)
kafka-server-stop.sh

# Or send SIGTERM
kill <broker-pid>

The broker will:

  1. Notify the controller
  2. Transfer leadership to other ISR members
  3. Complete in-flight requests
  4. Write clean shutdown marker

Avoid kill -9

kill -9 causes unclean shutdown, requiring full log recovery.


High Availability Settings

# Survive 2 broker failures
default.replication.factor=3

# Require 2 replicas to acknowledge writes
min.insync.replicas=2

# Never elect out-of-sync replica as leader
unclean.leader.election.enable=false

For failure scenarios and recovery procedures, see Fault Tolerance.


Configuration Quick Reference

Identity and Networking

node.id=1
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://broker1.example.com:9092

Storage

log.dirs=/var/kafka-logs
log.retention.hours=168
log.segment.bytes=1073741824

For log segment internals, indexes, and compaction, see Storage Engine.

Threading

num.network.threads=3
num.io.threads=8
num.replica.fetchers=1

# Request queue
queued.max.requests=500

# Socket settings
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600

Replication

default.replication.factor=3
min.insync.replicas=2
replica.lag.time.max.ms=30000

Recovery

# Recovery threads (increase for faster recovery)
num.recovery.threads.per.data.dir=1

# Unclean leader election (data loss risk)
unclean.leader.election.enable=false

For complete configuration reference, see Broker Configuration.


Key Metrics

Metric Alert Threshold
kafka.network:type=RequestChannel,name=RequestQueueSize > 100 sustained
kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent < 30%
kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent < 30%
kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Produce > 1000

Topic Description
Storage Engine Log segments, indexes, compaction, retention
Replication ISR, leader election, high watermark
Memory Management JVM heap, page cache, zero-copy
KRaft Raft consensus, controller quorum, migration
Fault Tolerance Failure detection and recovery
Cluster Management Metadata management and coordination