nodetool repair¶

Runs anti-entropy repair to synchronize data across replicas, ensuring consistency and preventing data resurrection from expired tombstones.

Synopsis¶

nodetool [connection_options] repair [options] [--] [keyspace [table ...]]

See connection options for connection options.

Description¶

nodetool repair compares data between replica nodes using Merkle trees and streams any differences to ensure all replicas hold identical data. Repair is essential for:

Maintaining data consistency
Preventing tombstone resurrection (zombie data)
Recovering from node failures or network partitions

Comprehensive Repair Documentation

For detailed repair concepts, strategies, and scheduling guidance, see:

Repair Concepts - How repair works
Repair Options Reference - All options explained
Repair Strategies - Implementation approaches
Repair Scheduling - Planning repair cycles

Arguments¶

Argument	Description
`keyspace`	Keyspace to repair. Required for targeted repairs
`table`	Specific table(s) to repair. If omitted, repairs all tables

Key Options¶

Option	Description
`-pr, --partitioner-range`	Repair only primary range (recommended)
`--full`	Full repair instead of incremental
`-seq, --sequential`	Repair one node at a time
`-dcpar, --dc-parallel`	Parallel within DC, sequential across DCs
`-dc, --in-dc`	Repair only within specified datacenter(s)
`-local, --in-local-dc`	Repair only within local datacenter
`-st, --start-token`	Start token for repair range
`-et, --end-token`	End token for repair range
`-j, --job-threads`	Number of repair job threads

Common Usage Patterns¶

Primary Range Repair (Recommended)¶

nodetool repair -pr my_keyspace

Always Use -pr

Without -pr, each node repairs all ranges it holds (primary + replica), causing redundant work. With -pr, run repair on every node to cover all ranges exactly once.

Full vs Incremental Repair¶

# Full repair (default before 4.0)
nodetool repair --full -pr my_keyspace

# Incremental repair (default in 4.0+)
nodetool repair -pr my_keyspace

Type	Behavior	Use Case
Full	Repairs all data	Recovery, initial sync
Incremental	Repairs only unrepaired data	Regular maintenance

Local Datacenter Only¶

nodetool repair -pr -local my_keyspace

Repairs only with replicas in the same datacenter.

Specific Token Range¶

nodetool repair -pr -st 0 -et 1000000000 my_keyspace

Repairs only the specified token range (subrange repair).

When to Use¶

Routine Maintenance¶

gc_grace_seconds Constraint

Repair must complete on all nodes within gc_grace_seconds (default 10 days) to prevent tombstone resurrection.

# Run on each node
nodetool repair -pr my_keyspace

After Node Recovery¶

After a node was down for extended time:

nodetool repair -pr my_keyspace

After Network Partition¶

If nodes were isolated:

nodetool repair -pr my_keyspace

Before Major Version Upgrade¶

Ensure consistency before upgrading:

nodetool repair --full my_keyspace

When NOT to Use¶

Repair Considerations

Avoid repair:

During high traffic - Significant resource impact
While streaming - Interferes with bootstrap/decommission
With down nodes - Repair will fail or skip ranges
Immediately after bulk load - Wait for compaction

Impact Analysis¶

Resource Usage¶

Resource	Impact
Network	High - streams data between nodes
Disk I/O	High - reads SSTables, writes repairs
CPU	Moderate - Merkle tree calculation
Memory	Merkle trees require heap space

Performance Impact¶

During repair, the following operations impact cluster performance:

Operation	Description
Merkle Tree Build	Computes hash trees for data comparison
Data Comparison	Compares trees between replicas
Data Streaming	Streams differing data between nodes

Expected impact during repair:

Metric	Impact
Read latency	+10-30%
Write latency	+5-15%
Network utilization	+20-50%

Monitoring Repair¶

Check Active Repairs¶

nodetool repair_admin list

Shows running repair sessions.

Monitor Progress¶

nodetool netstats

Shows streaming activity from repair.

Check Repair History¶

nodetool repair_admin list --all

Shows completed and failed repairs.

Abort Repair¶

nodetool repair_admin cancel <repair_id>

Canceling Repair

Canceled repairs leave data partially synchronized. Restart repair to complete synchronization.

Examples¶

Standard Maintenance Repair¶

# Run on each node sequentially
nodetool repair -pr my_keyspace

Repair Specific Table¶

nodetool repair -pr my_keyspace users

Parallel Repair (Faster)¶

nodetool repair -pr --parallel my_keyspace

Multi-DC Repair¶

# Repair with all DCs
nodetool repair -pr my_keyspace

# Repair specific DCs only
nodetool repair -pr -dc dc1 -dc dc2 my_keyspace

Verbose Output¶

nodetool repair -pr --trace my_keyspace

Common Issues¶

Repair Fails with Timeout¶

ERROR: Repair failed with error: Repair job timed out

Solutions: - Reduce repair scope (single table) - Use subrange repair - Increase streaming_socket_timeout_in_ms

Repair Session Already Running¶

ERROR: Repair session already in progress

Check and wait for existing repair:

nodetool repair_admin list

Out of Memory¶

ERROR: java.lang.OutOfMemoryError: Java heap space

Merkle trees consume heap. Solutions: - Reduce repair parallelism - Increase heap size - Use subrange repair

Inconsistent Data After Repair¶

If data still appears inconsistent: 1. Verify repair completed successfully 2. Check all nodes were repaired 3. Run nodetool repair --full for complete sync

Best Practices¶

Repair Guidelines

Use -pr flag - Prevents redundant work
Complete within gc_grace_seconds - Prevent zombies
One node at a time - For sequential strategy
Off-peak hours - Minimize production impact
Monitor progress - Watch for failures
Automate - Use AxonOps for scheduling

Repair Schedule Example¶

Cluster Size	Strategy	Frequency
3-6 nodes	Sequential	Weekly
6-20 nodes	Parallel	Every 3-5 days
20-50 nodes	DC-parallel	Every 2-3 days
50+ nodes	Continuous (AxonOps)	Always running

Command	Relationship
repair_admin	Manage repair sessions
netstats	Monitor streaming
status	Check node states before repair
scrub	Fix local SSTable corruption