Skip to content
Maintained by AxonOps — production-grade documentation from engineers who operate distributed databases at scale Get Cassandra Help Get Kafka Help

nodetool repair

Runs anti-entropy repair to synchronize data across replicas, ensuring consistency and preventing data resurrection from expired tombstones.


Synopsis

nodetool [connection_options] repair [options] [--] [keyspace [table ...]]
See connection options for connection options.

Description

nodetool repair compares data between replica nodes using Merkle trees and streams any differences to ensure all replicas hold identical data. Repair is essential for:

  • Maintaining data consistency
  • Preventing tombstone resurrection (zombie data)
  • Recovering from node failures or network partitions

Comprehensive Repair Documentation

For detailed repair concepts, strategies, and scheduling guidance, see:


Arguments

Argument Description
keyspace Keyspace to repair. Required for targeted repairs
table Specific table(s) to repair. If omitted, repairs all tables

Key Options

Option Description
-pr, --partitioner-range Repair only primary range (recommended)
--full Full repair instead of incremental
-seq, --sequential Repair one node at a time
-dcpar, --dc-parallel Parallel within DC, sequential across DCs
-dc, --in-dc Repair only within specified datacenter(s)
-local, --in-local-dc Repair only within local datacenter
-st, --start-token Start token for repair range
-et, --end-token End token for repair range
-j, --job-threads Number of repair job threads

Common Usage Patterns

nodetool repair -pr my_keyspace

Always Use -pr

Without -pr, each node repairs all ranges it holds (primary + replica), causing redundant work. With -pr, run repair on every node to cover all ranges exactly once.

Full vs Incremental Repair

# Full repair (default before 4.0)
nodetool repair --full -pr my_keyspace

# Incremental repair (default in 4.0+)
nodetool repair -pr my_keyspace
Type Behavior Use Case
Full Repairs all data Recovery, initial sync
Incremental Repairs only unrepaired data Regular maintenance

Local Datacenter Only

nodetool repair -pr -local my_keyspace

Repairs only with replicas in the same datacenter.

Specific Token Range

nodetool repair -pr -st 0 -et 1000000000 my_keyspace

Repairs only the specified token range (subrange repair).


When to Use

Routine Maintenance

gc_grace_seconds Constraint

Repair must complete on all nodes within gc_grace_seconds (default 10 days) to prevent tombstone resurrection.

# Run on each node
nodetool repair -pr my_keyspace

After Node Recovery

After a node was down for extended time:

nodetool repair -pr my_keyspace

After Network Partition

If nodes were isolated:

nodetool repair -pr my_keyspace

Before Major Version Upgrade

Ensure consistency before upgrading:

nodetool repair --full my_keyspace

When NOT to Use

Repair Considerations

Avoid repair:

  • During high traffic - Significant resource impact
  • While streaming - Interferes with bootstrap/decommission
  • With down nodes - Repair will fail or skip ranges
  • Immediately after bulk load - Wait for compaction

Impact Analysis

Resource Usage

Resource Impact
Network High - streams data between nodes
Disk I/O High - reads SSTables, writes repairs
CPU Moderate - Merkle tree calculation
Memory Merkle trees require heap space

Performance Impact

During repair, the following operations impact cluster performance:

Operation Description
Merkle Tree Build Computes hash trees for data comparison
Data Comparison Compares trees between replicas
Data Streaming Streams differing data between nodes

Expected impact during repair:

Metric Impact
Read latency +10-30%
Write latency +5-15%
Network utilization +20-50%

Monitoring Repair

Check Active Repairs

nodetool repair_admin list

Shows running repair sessions.

Monitor Progress

nodetool netstats

Shows streaming activity from repair.

Check Repair History

nodetool repair_admin list --all

Shows completed and failed repairs.

Abort Repair

nodetool repair_admin cancel <repair_id>

Canceling Repair

Canceled repairs leave data partially synchronized. Restart repair to complete synchronization.


Examples

Standard Maintenance Repair

# Run on each node sequentially
nodetool repair -pr my_keyspace

Repair Specific Table

nodetool repair -pr my_keyspace users

Parallel Repair (Faster)

nodetool repair -pr --parallel my_keyspace

Multi-DC Repair

# Repair with all DCs
nodetool repair -pr my_keyspace

# Repair specific DCs only
nodetool repair -pr -dc dc1 -dc dc2 my_keyspace

Verbose Output

nodetool repair -pr --trace my_keyspace

Common Issues

Repair Fails with Timeout

ERROR: Repair failed with error: Repair job timed out

Solutions: - Reduce repair scope (single table) - Use subrange repair - Increase streaming_socket_timeout_in_ms

Repair Session Already Running

ERROR: Repair session already in progress

Check and wait for existing repair:

nodetool repair_admin list

Out of Memory

ERROR: java.lang.OutOfMemoryError: Java heap space

Merkle trees consume heap. Solutions: - Reduce repair parallelism - Increase heap size - Use subrange repair

Inconsistent Data After Repair

If data still appears inconsistent: 1. Verify repair completed successfully 2. Check all nodes were repaired 3. Run nodetool repair --full for complete sync


Best Practices

Repair Guidelines

  1. Use -pr flag - Prevents redundant work
  2. Complete within gc_grace_seconds - Prevent zombies
  3. One node at a time - For sequential strategy
  4. Off-peak hours - Minimize production impact
  5. Monitor progress - Watch for failures
  6. Automate - Use AxonOps for scheduling

Repair Schedule Example

Cluster Size Strategy Frequency
3-6 nodes Sequential Weekly
6-20 nodes Parallel Every 3-5 days
20-50 nodes DC-parallel Every 2-3 days
50+ nodes Continuous (AxonOps) Always running

Command Relationship
repair_admin Manage repair sessions
netstats Monitor streaming
status Check node states before repair
scrub Fix local SSTable corruption