GC Pause Issues¶

Long garbage collection pauses cause application stalls, timeouts, and can trigger node failures in extreme cases.

Symptoms¶

Application timeouts correlating with GC events
"GC pause" warnings in Cassandra logs (> 200ms)
Nodes marked DOWN intermittently
Spiky latency patterns
nodetool tpstats showing dropped messages

Diagnosis¶

Step 1: Check GC Statistics¶

nodetool gcstats

Healthy output:

Interval (ms)  Max GC Elapsed (ms)  Total GC Elapsed (ms)  Stdev GC Elapsed (ms)  GC Reclaimed (MB)  Collections  Direct Memory Bytes
      1053721                   45                    892                      12              15234          123           104857600

Problem indicators: - Max GC Elapsed > 500ms - Many collections with high elapsed time

Step 2: Analyze GC Logs¶

# Find long pauses
grep "GC pause" /var/log/cassandra/gc.log | awk '$NF > 500 {print}' | tail -20

# Or for G1GC
grep "Pause" /var/log/cassandra/gc.log | tail -50

Step 3: Check Heap Usage¶

nodetool info | grep -i heap

Problem indicators: - Used heap consistently > 75% of max - Heap usage approaching max

Step 4: Correlate with Cassandra Logs¶

# Find GC-related warnings
grep -i "gc\|pause\|heap" /var/log/cassandra/system.log | tail -50

Step 5: Check for Large Partitions¶

# Large partitions cause heap pressure during reads
nodetool tablestats my_keyspace | grep -E "Table:|partition size"

Resolution¶

Immediate: Reduce Heap Pressure¶

# Clear key cache if very large
nodetool invalidatekeycache

# Flush memtables to reduce memory pressure
nodetool flush

Concurrent Reads/Writes

Concurrent reads/writes are configured via concurrent_reads and concurrent_writes in cassandra.yaml and require a restart to change. There is no runtime nodetool setconcurrency command.

Short-term: Tune GC Settings¶

For G1GC (recommended for heaps > 8GB):

# In jvm.options or jvm11-server.options
-XX:+UseG1GC
-XX:G1HeapRegionSize=16m
-XX:MaxGCPauseMillis=300
-XX:InitiatingHeapOccupancyPercent=45
-XX:ParallelGCThreads=8
-XX:ConcGCThreads=4

For CMS (legacy, heaps < 8GB):

-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

Medium-term: Adjust Heap Size¶

Rule of thumb: - Maximum heap: 8GB for most workloads - Heap > 16GB often increases GC pauses - Leave room for off-heap (page cache, bloom filters)

# In jvm.options
-Xms8G
-Xmx8G

Heap Sizing

Larger heaps don't always improve performance. Cassandra uses off-heap memory for many structures. 8GB is often optimal.

Long-term: Address Root Causes¶

Cause 1: Large partitions

# Find large partitions
nodetool tablestats my_keyspace | grep -E "Compacted partition maximum bytes"

Fix: Redesign data model to limit partition size to < 100MB.

Cause 2: Wide rows with many columns

Fix: Limit columns per row, consider separate tables.

Cause 3: Heavy read/write load

Fix: Add nodes, optimize queries, implement caching.

Cause 4: Tombstone scans

nodetool tablestats my_keyspace | grep tombstone

Fix: See Tombstone Accumulation playbook.

Recovery¶

Verify GC Improvement¶

# Monitor GC stats
watch -n 10 'nodetool gcstats'

# Check for reduced pause times
grep "Pause" /var/log/cassandra/gc.log | tail -20

Monitor Application Impact¶

Check client-side latencies
Verify no dropped messages: nodetool tpstats | grep -i dropped
Confirm node stability: nodetool status

GC Tuning Reference¶

G1GC Options¶

Option	Default	Recommended	Purpose
`G1HeapRegionSize`	Auto	16m	Region size for G1
`MaxGCPauseMillis`	200	300	Target max pause
`InitiatingHeapOccupancyPercent`	45	45-65	When to start concurrent GC
`ParallelGCThreads`	cores	cores	Parallel GC threads
`ConcGCThreads`	cores/4	cores/4	Concurrent GC threads

Heap Size Guidelines¶

Workload	Recommended Heap	Notes
Light (< 100 GB data)	4-8 GB	Smaller heap = faster GC
Medium (100-500 GB)	8 GB	Sweet spot for most
Heavy (> 500 GB)	8-16 GB	Consider more nodes instead
Very large partitions	16-31 GB	Fix data model if possible

Prevention¶

Monitor GC metrics - Alert on pauses > 500ms
Limit partition sizes - Design for < 100MB per partition
Run repairs - Enables tombstone cleanup
Avoid heap > 16GB - Diminishing returns
Use G1GC - Better pause time control
Profile workloads - Identify memory-intensive operations

Command	Purpose
`nodetool gcstats`	GC statistics
`nodetool info`	Heap usage
`nodetool tablestats`	Table metrics including partition sizes
`nodetool tpstats`	Thread pool and dropped messages

JVM Options - JVM configuration
Performance Tuning - Performance optimization
Large Partition Issues - Handling large partitions