Kafka Performance¶

Performance tuning guide for Apache Kafka clusters.

Performance Dimensions¶

Dimension	Description	Trade-offs
Throughput	Messages/bytes per second	May increase latency
Latency	End-to-end message delay	May reduce throughput
Durability	Data persistence guarantee	Impacts throughput
Availability	System uptime	Requires more resources

Throughput Optimization¶

Producer Throughput¶

# High throughput producer configuration
batch.size=131072                    # 128KB batches
linger.ms=20                         # Wait for batching
buffer.memory=67108864               # 64MB buffer
compression.type=lz4                 # Fast compression
acks=1                               # Leader ack only (trade durability)
max.in.flight.requests.per.connection=5

Throughput Checklist: - [ ] Enable compression (lz4 or zstd) - [ ] Increase batch.size - [ ] Set linger.ms > 0 - [ ] Use async sends - [ ] Partition across multiple brokers

Consumer Throughput¶

# High throughput consumer configuration
fetch.min.bytes=65536                # Fetch at least 64KB
fetch.max.wait.ms=500                # Wait up to 500ms
fetch.max.bytes=52428800             # 50MB per fetch
max.partition.fetch.bytes=10485760   # 10MB per partition
max.poll.records=1000                # Records per poll

Broker Throughput¶

# Broker configuration
num.network.threads=8                # Network I/O threads
num.io.threads=16                    # Disk I/O threads
socket.send.buffer.bytes=1048576     # 1MB send buffer
socket.receive.buffer.bytes=1048576  # 1MB receive buffer
num.replica.fetchers=4               # Replication threads

Latency Optimization¶

Low Latency Producer¶

# Low latency producer configuration
batch.size=16384                     # Small batches
linger.ms=0                          # No batching delay
compression.type=none                # No compression overhead
acks=1                               # Leader ack only
max.block.ms=1000                    # Fast failure

Low Latency Consumer¶

# Low latency consumer configuration
fetch.min.bytes=1                    # Return immediately
fetch.max.wait.ms=0                  # No wait
max.poll.records=100                 # Small batches

Broker Latency¶

# Broker configuration
socket.request.max.bytes=10485760    # Smaller max request
num.network.threads=16               # More network threads

Durability Configuration¶

Maximum Durability¶

# Producer
acks=all
enable.idempotence=true
retries=2147483647
max.in.flight.requests.per.connection=5

# Broker/Topic
min.insync.replicas=2
unclean.leader.election.enable=false
default.replication.factor=3

Balanced Durability¶

# Producer
acks=all
enable.idempotence=true

# Topic
min.insync.replicas=2
replication.factor=3

Compression Tuning¶

Compression Comparison¶

Algorithm	CPU Usage	Compression Ratio	Speed
none	0%	1.0x	Fastest
snappy	Low	1.5-2x	Fast
lz4	Low	2-3x	Fast
gzip	High	3-5x	Slow
zstd	Medium	3-4x	Fast

Compression Selection¶

# Best for most use cases
compression.type=lz4

# Maximum compression
compression.type=zstd

# CPU constrained
compression.type=snappy

Partition Tuning¶

Partition Count¶

Formula:

Partitions = max(
  throughput / per_partition_throughput,
  consumer_instances
)

Guidelines: - ~10 MB/s per partition typical - More partitions = more parallelism - Too many partitions = overhead

Partition Distribution¶

# Check leader distribution
kafka-topics.sh --bootstrap-server kafka:9092 --describe | \
  grep "Leader:" | awk '{print $4}' | sort | uniq -c

# Rebalance leaders
kafka-leader-election.sh --bootstrap-server kafka:9092 \
  --election-type preferred \
  --all-topic-partitions

OS Tuning¶

Linux Kernel Parameters¶

# /etc/sysctl.conf

# Network buffers
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.rmem_default=16777216
net.core.wmem_default=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 87380 16777216

# File descriptors
fs.file-max=1000000

# Virtual memory
vm.swappiness=1
vm.dirty_ratio=80
vm.dirty_background_ratio=5

# Page cache
vm.vfs_cache_pressure=50

File Descriptor Limits¶

# /etc/security/limits.conf
kafka soft nofile 128000
kafka hard nofile 128000
kafka soft nproc 128000
kafka hard nproc 128000

Disk Configuration¶

Filesystem selection significantly impacts Kafka throughput. XFS is recommended for all log directories.

Filesystem	Recommendation	Notes
XFS	Recommended	Best sequential write performance, allocation group parallelism
ext4	Acceptable	Suitable for smaller deployments
ZFS	Not recommended	CoW overhead, ARC competes with page cache

# Mount options for Kafka data (XFS)
/dev/nvme0n1 /kafka/data xfs noatime,nodiratime 0 2

# I/O scheduler (for NVMe/SSDs)
echo none > /sys/block/nvme0n1/queue/scheduler

→ Filesystem Selection Guide - Complete filesystem architecture comparison, page cache tuning, and configuration

JVM Tuning¶

Heap Configuration¶

# Recommended heap size: 6-8GB
export KAFKA_HEAP_OPTS="-Xms6g -Xmx6g"

GC Configuration¶

export KAFKA_JVM_PERFORMANCE_OPTS="-server \
  -XX:+UseG1GC \
  -XX:MaxGCPauseMillis=20 \
  -XX:InitiatingHeapOccupancyPercent=35 \
  -XX:G1HeapRegionSize=16M \
  -XX:MinMetaspaceFreeRatio=50 \
  -XX:MaxMetaspaceFreeRatio=80"

Benchmarking¶

Producer Benchmark¶

# Throughput test
kafka-producer-perf-test.sh \
  --topic test-topic \
  --num-records 10000000 \
  --record-size 1024 \
  --throughput -1 \
  --producer-props \
    bootstrap.servers=kafka:9092 \
    batch.size=65536 \
    linger.ms=10 \
    compression.type=lz4

Consumer Benchmark¶

# Throughput test
kafka-consumer-perf-test.sh \
  --bootstrap-server kafka:9092 \
  --topic test-topic \
  --messages 10000000 \
  --threads 4

End-to-End Latency¶

kafka-run-class.sh kafka.tools.EndToEndLatency \
  kafka:9092 \
  test-topic \
  10000 \
  all \
  1024

Performance Metrics¶

Key Metrics¶

Metric	Description	Target
`MessagesInPerSec`	Produce rate	Workload dependent
`BytesInPerSec`	Bytes produced	Workload dependent
`BytesOutPerSec`	Bytes consumed	Workload dependent
`TotalTimeMs (P99)`	Request latency	< 100ms
`RequestQueueTimeMs`	Queue time	< 10ms
`UnderReplicatedPartitions`	Replication health	0

Identifying Bottlenecks¶

Symptom	Likely Cause	Solution
High RequestQueueTimeMs	Network threads saturated	Increase num.network.threads
High LocalTimeMs	Disk I/O slow	Faster disks, more I/O threads
High RemoteTimeMs	Replication lag	More replica fetchers
High ResponseQueueTimeMs	Network threads saturated	Increase num.network.threads

Filesystem Selection - Filesystem recommendations and page cache tuning
Operations Overview - Operations guide
Capacity Planning - Sizing guide
Configuration - Configuration reference
Monitoring - Metrics and alerting