Skip to content
Maintained by AxonOps — production-grade documentation from engineers who operate distributed databases at scale

Kafka Memory Management

Memory architecture and optimization for Apache Kafka brokers and clients.


Memory Architecture Overview

uml diagram


JVM Heap

Heap Components

Component Description Memory Impact
Request buffers Incoming/outgoing request data Proportional to connections
Metadata cache Topic/partition metadata Proportional to partitions
Index structures In-memory index pointers Proportional to partitions
Producer state Idempotent producer tracking Proportional to producers × partitions
Group coordinator Consumer group state Proportional to groups/members

Heap Footprint Estimates (Repository Guidance)

Area Rule of Thumb
Broker heap per partition replica ~1-2 MB
Controller metadata heap ~5 GB for typical clusters

Heap Sizing (Repository Guidance)

Cluster Size Partitions Heap Size
Small < 1,000 4-6 GB
Medium 1,000-10,000 6-8 GB
Large 10,000-50,000 8-12 GB
Very Large > 50,000 12-16 GB

JVM Configuration

# Example JVM settings (tune per workload)
export KAFKA_HEAP_OPTS="-Xms6g -Xmx6g"

# GC settings (example values)
export KAFKA_JVM_PERFORMANCE_OPTS="-server \
  -XX:+UseG1GC \
  -XX:MaxGCPauseMillis=20 \
  -XX:InitiatingHeapOccupancyPercent=35 \
  -XX:+ExplicitGCInvokesConcurrent \
  -XX:G1HeapRegionSize=16M \
  -XX:MetaspaceSize=96m \
  -XX:MinMetaspaceFreeRatio=50 \
  -XX:MaxMetaspaceFreeRatio=80"

OS Page Cache

Page Cache Role

Kafka relies heavily on the OS page cache for performance. The page cache stores recently accessed disk data in RAM.

uml diagram

Page Cache Sizing

Rule of thumb: Reserve at least as much RAM for page cache as data you want to keep "hot" (typically last few hours of data).

Page Cache = Total RAM - JVM Heap - OS Overhead

Example:
  Total RAM: 64 GB
  JVM Heap: 6 GB
  OS/Other: 2 GB
  Page Cache: ~56 GB available

Monitoring Page Cache

# Check memory usage
free -g

# Check page cache usage
cat /proc/meminfo | grep -E "Cached|Buffers|MemFree|MemTotal"

# Monitor disk I/O (high I/O = cache misses)
iostat -x 1

Buffer Pools

Network Buffer Pool

Kafka uses buffer pools to reduce garbage collection overhead.

# Broker network buffer sizing
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600

Default Memory/Buffer Limits (Kafka Defaults)

Component Setting Default
Broker socket.send.buffer.bytes 102400
Broker socket.receive.buffer.bytes 102400
Broker socket.request.max.bytes 104857600
Producer buffer.memory 33554432
Producer batch.size 16384
Producer linger.ms 0
Producer max.block.ms 60000
Consumer fetch.min.bytes 1
Consumer fetch.max.bytes 52428800
Consumer fetch.max.wait.ms 500
Consumer max.partition.fetch.bytes 1048576
Topic index.interval.bytes 4096

Producer Buffer Pool

uml diagram

# Producer buffer configuration (example tuning)
buffer.memory=33554432           # 32MB total buffer pool
batch.size=16384                 # 16KB per batch
linger.ms=5                      # Wait time for batching

Consumer Buffer Pool

# Consumer fetch sizing
fetch.min.bytes=1                # Minimum bytes to fetch
fetch.max.bytes=52428800         # Maximum per fetch (50MB)
max.partition.fetch.bytes=1048576  # Per partition (1MB)

Memory Pressure Scenarios

Heap Exhaustion

Symptoms: - OutOfMemoryError - GC taking > 10% of time - Request latency spikes

Causes:

Cause Solution
Too many partitions Reduce partitions or increase heap
Large metadata cache Reduce topic count
Producer state buildup Reduce idempotent producers
Memory leak Update Kafka version

Page Cache Exhaustion

Symptoms: - High disk read I/O - Consumer latency increases - await time in iostat high

Causes:

Cause Solution
Heap too large Reduce heap, leave more for cache
Too much data Add more brokers
Random access patterns Improve consumer patterns

Garbage Collection

G1GC Tuning

# Recommended G1GC settings
-XX:+UseG1GC
-XX:MaxGCPauseMillis=20
-XX:InitiatingHeapOccupancyPercent=35
-XX:G1HeapRegionSize=16M
Parameter Purpose
MaxGCPauseMillis Target pause time (20ms recommended)
InitiatingHeapOccupancyPercent When to start concurrent GC
G1HeapRegionSize Region size (16M for larger heaps)

GC Monitoring

# Enable GC logging
-Xlog:gc*:file=/var/log/kafka/gc.log:time,tags:filecount=10,filesize=100M

# Monitor GC
jstat -gc <pid> 1000

# Analyze GC log
# Look for: pause times, frequency, throughput

GC Monitoring Targets (Repository Guidance)

Use GC pause time, frequency, and throughput as trend indicators rather than fixed SLAs.


Direct Memory

Off-Heap Buffers

Kafka uses direct memory for network I/O operations.

# Configure direct memory limit (example)
-XX:MaxDirectMemorySize=2g

Memory-Mapped Files

Index files use memory-mapped I/O:

# These files are memory-mapped
# .index - offset index
# .timeindex - timestamp index

Client Memory Management

Producer Memory

# Total memory for buffering
buffer.memory=33554432

# Memory allocation behavior
max.block.ms=60000  # Block when buffer full

Memory calculation:

Required memory = buffer.memory +
                  (partitions × batch.size overhead) +
                  compression buffers

Consumer Memory

# Fetch sizing
fetch.max.bytes=52428800
max.poll.records=500

Memory calculation:

Required memory = fetch.max.bytes +
                  deserialization buffers +
                  record processing buffers

Memory Tuning Checklist

Broker

  • [ ] Set heap size appropriately (6-12GB typical)
  • [ ] Leave sufficient RAM for page cache
  • [ ] Configure G1GC with appropriate pause target
  • [ ] Monitor GC pause times and frequency
  • [ ] Watch for page cache evictions

Producer

  • [ ] Size buffer.memory for throughput needs
  • [ ] Set appropriate batch.size
  • [ ] Monitor buffer-available-bytes metric

Consumer

  • [ ] Configure fetch sizes appropriately
  • [ ] Set max.poll.records for processing capacity
  • [ ] Monitor memory usage in application