Kafka Cluster Topology¶
Cluster topology design for fault tolerance, performance, and operational efficiency.
Topology Overview¶
Rack Awareness¶
Rack awareness ensures partition replicas are distributed across failure domains to survive rack-level failures.
Configuration¶
# server.properties - set on each broker
broker.rack=rack-1
Replica Distribution¶
With rack awareness enabled, Kafka distributes replicas across racks:
| Topic | Partition | Replica 1 | Replica 2 | Replica 3 |
|---|---|---|---|---|
| orders | 0 | Broker 1 (rack-1) | Broker 2 (rack-2) | Broker 3 (rack-3) |
| orders | 1 | Broker 2 (rack-2) | Broker 3 (rack-3) | Broker 4 (rack-1) |
| orders | 2 | Broker 3 (rack-3) | Broker 1 (rack-1) | Broker 5 (rack-2) |
Rack-Aware Client Configuration¶
Consumers can prefer reading from replicas in the same rack to reduce cross-rack traffic.
# consumer.properties
client.rack=rack-1
# broker configuration (rack-aware replica selection)
replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector
Follower fetching
client.rack enables clients to prefer local replicas when supported (Kafka 2.4+). If no matching replica is available, consumers read from the leader.
Network Architecture¶
Bandwidth Requirements¶
| Traffic Type | Sizing |
|---|---|
| Produce | Peak produce throughput × number of brokers receiving |
| Replication | Peak produce throughput × (replication factor - 1) |
| Consume | Peak consume throughput × consumer fan-out |
| Inter-broker | Metadata + coordination overhead |
Network Sizing Formula¶
Required bandwidth = (P × (RF - 1)) + C + P
Where:
P = Peak produce throughput (MB/s)
RF = Replication factor
C = Peak consume throughput (MB/s)
Example: - Peak produce: 500 MB/s - Replication factor: 3 - Peak consume: 1000 MB/s (2x fanout) - Required: (500 × 3) + 1000 = 2500 MB/s = 20 Gbps
Network Topology Patterns¶
Broker Placement¶
Placement Strategies¶
| Strategy | Description | Use Case |
|---|---|---|
| Rack-balanced | Equal brokers per rack | Standard HA deployment |
| Zone-balanced | Equal brokers per availability zone | Cloud deployments |
| Performance-tiered | Faster hardware for leaders | Latency-sensitive workloads |
Cloud Availability Zone Mapping¶
# AWS example - map AZ to rack
broker.rack=us-east-1a
# Azure example
broker.rack=eastus-zone1
# GCP example
broker.rack=us-central1-a
Minimum Broker Requirements¶
| Replication Factor | Minimum Brokers | Recommended Brokers |
|---|---|---|
| 1 | 1 | 1 (plus KRaft controllers if dedicated) |
| 2 | 2 | 4 (2 per rack) |
| 3 | 3 | 6 (2 per rack, 3 racks) |
Controller Topology¶
For complete KRaft internals including Raft consensus, failover behavior, and metadata management, see KRaft Deep Dive.
KRaft Controller Quorum¶
The controller quorum should be deployed across failure domains.
Controller Configuration¶
*Static quorum example:*
```properties
# Dedicated controller
process.roles=controller
node.id=1
controller.quorum.voters=1@controller1:9093,2@controller2:9093,3@controller3:9093
Dynamic quorums
Kafka 4.1+ supports dynamic controller quorums; use controller.quorum.bootstrap.servers instead of controller.quorum.voters.
Combined vs Dedicated Controllers¶
| Deployment | Use Case | Pros | Cons |
|---|---|---|---|
| Combined | Small clusters (< 10 brokers) | Fewer machines | Resource contention |
| Dedicated | Large clusters, high partition count | Isolation, stability | More machines |
# Combined controller + broker
process.roles=broker,controller
# Dedicated controller only
process.roles=controller
# Dedicated broker only
process.roles=broker
Multi-Datacenter Topology¶
Active-Passive¶
Active-Active¶
Partition Distribution¶
For leader election mechanics and preferred replica election, see Replication.
Leader Distribution¶
Leaders should be balanced across brokers for even load distribution.
# Check leader distribution
kafka-topics.sh --bootstrap-server kafka:9092 --describe | \
grep "Leader:" | awk '{print $4}' | sort | uniq -c
# Trigger preferred leader election
kafka-leader-election.sh --bootstrap-server kafka:9092 \
--election-type preferred \
--all-topic-partitions
Partition Reassignment¶
When adding or removing brokers, partitions must be reassigned.
# Generate reassignment plan
kafka-reassign-partitions.sh --bootstrap-server kafka:9092 \
--topics-to-move-json-file topics.json \
--broker-list "1,2,3,4,5,6" \
--generate
# Execute with throttle
kafka-reassign-partitions.sh --bootstrap-server kafka:9092 \
--reassignment-json-file reassignment.json \
--throttle 100000000 \
--execute
Sizing Guidelines¶
Broker Count¶
| Factor | Impact |
|---|---|
| Throughput | More brokers = more aggregate throughput |
| Storage | More brokers = more total storage |
| Partitions | ~4000 partitions per broker (repository guidance) |
| Replication | RF × partitions = total replicas distributed |
Partition Count¶
| Consideration | Guideline |
|---|---|
| Parallelism | Partitions ≥ max consumer instances |
| Throughput | ~10 MB/s per partition (repository guidance) |
| Overhead | Each partition has memory/file handle cost |
| Rebalance | More partitions = longer rebalance |
Formula for Partition Count¶
Partitions = max(
target_throughput / per_partition_throughput,
max_consumer_instances
)
Related Documentation¶
- Architecture Overview - System architecture
- Brokers - Broker configuration
- Replication - Replication protocol
- Fault Tolerance - Failure handling
- Multi-Datacenter - DR strategies