Kafka Cluster Topology¶
Cluster topology design for fault tolerance, performance, and operational efficiency.
Topology Overview¶
Rack Awareness¶
Rack awareness ensures partition replicas are distributed across failure domains to survive rack-level failures.
Configuration¶
# server.properties - set on each broker
broker.rack=rack-1
Replica Distribution¶
With rack awareness enabled, Kafka distributes replicas across racks:
| Topic | Partition | Replica 1 | Replica 2 | Replica 3 |
|---|---|---|---|---|
| orders | 0 | Broker 1 (rack-1) | Broker 2 (rack-2) | Broker 3 (rack-3) |
| orders | 1 | Broker 2 (rack-2) | Broker 3 (rack-3) | Broker 4 (rack-1) |
| orders | 2 | Broker 3 (rack-3) | Broker 1 (rack-1) | Broker 5 (rack-2) |
Rack-Aware Client Configuration¶
Consumers can prefer reading from replicas in the same rack to reduce cross-rack traffic.
# consumer.properties
client.rack=rack-1
# broker configuration to enable follower fetching
replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector
Network Architecture¶
Bandwidth Requirements¶
| Traffic Type | Sizing |
|---|---|
| Produce | Peak produce throughput × number of brokers receiving |
| Replication | Peak produce throughput × (replication factor - 1) |
| Consume | Peak consume throughput × consumer fan-out |
| Inter-broker | Metadata + coordination overhead |
Network Sizing Formula¶
Required bandwidth = (P × RF) + C
Where:
P = Peak produce throughput (MB/s)
RF = Replication factor
C = Peak consume throughput (MB/s)
Example: - Peak produce: 500 MB/s - Replication factor: 3 - Peak consume: 1000 MB/s (2x fanout) - Required: (500 × 3) + 1000 = 2500 MB/s = 20 Gbps
Network Topology Patterns¶
Broker Placement¶
Placement Strategies¶
| Strategy | Description | Use Case |
|---|---|---|
| Rack-balanced | Equal brokers per rack | Standard HA deployment |
| Zone-balanced | Equal brokers per availability zone | Cloud deployments |
| Performance-tiered | Faster hardware for leaders | Latency-sensitive workloads |
Cloud Availability Zone Mapping¶
# AWS example - map AZ to rack
broker.rack=us-east-1a
# Azure example
broker.rack=eastus-zone1
# GCP example
broker.rack=us-central1-a
Minimum Broker Requirements¶
| Replication Factor | Minimum Brokers | Recommended Brokers |
|---|---|---|
| 1 | 1 | 3 (for controller quorum) |
| 2 | 2 | 4 (2 per rack) |
| 3 | 3 | 6 (2 per rack, 3 racks) |
Controller Topology¶
For complete KRaft internals including Raft consensus, failover behavior, and metadata management, see KRaft Deep Dive.
KRaft Controller Quorum¶
The controller quorum should be deployed across failure domains.
Controller Configuration¶
# Dedicated controller
process.roles=controller
node.id=1
controller.quorum.voters=1@controller1:9093,2@controller2:9093,3@controller3:9093
Combined vs Dedicated Controllers¶
| Deployment | Use Case | Pros | Cons |
|---|---|---|---|
| Combined | Small clusters (< 10 brokers) | Fewer machines | Resource contention |
| Dedicated | Large clusters, high partition count | Isolation, stability | More machines |
# Combined controller + broker
process.roles=broker,controller
# Dedicated controller only
process.roles=controller
# Dedicated broker only
process.roles=broker
Multi-Datacenter Topology¶
Active-Passive¶
Active-Active¶
Partition Distribution¶
For leader election mechanics and preferred replica election, see Replication.
Leader Distribution¶
Leaders should be balanced across brokers for even load distribution.
# Check leader distribution
kafka-topics.sh --bootstrap-server kafka:9092 --describe | \
grep "Leader:" | awk '{print $4}' | sort | uniq -c
# Trigger preferred leader election
kafka-leader-election.sh --bootstrap-server kafka:9092 \
--election-type preferred \
--all-topic-partitions
Partition Reassignment¶
When adding or removing brokers, partitions must be reassigned.
# Generate reassignment plan
kafka-reassign-partitions.sh --bootstrap-server kafka:9092 \
--topics-to-move-json-file topics.json \
--broker-list "1,2,3,4,5,6" \
--generate
# Execute with throttle
kafka-reassign-partitions.sh --bootstrap-server kafka:9092 \
--reassignment-json-file reassignment.json \
--throttle 100000000 \
--execute
Sizing Guidelines¶
Broker Count¶
| Factor | Impact |
|---|---|
| Throughput | More brokers = more aggregate throughput |
| Storage | More brokers = more total storage |
| Partitions | ~4000 partitions per broker recommended maximum |
| Replication | RF × partitions = total replicas distributed |
Partition Count¶
| Consideration | Guideline |
|---|---|
| Parallelism | Partitions ≥ max consumer instances |
| Throughput | ~10 MB/s per partition typical |
| Overhead | Each partition has memory/file handle cost |
| Rebalance | More partitions = longer rebalance |
Formula for Partition Count¶
Partitions = max(
target_throughput / per_partition_throughput,
max_consumer_instances
)
Related Documentation¶
- Architecture Overview - System architecture
- Brokers - Broker configuration
- Replication - Replication protocol
- Fault Tolerance - Failure handling
- Multi-Datacenter - DR strategies