Rolling Restart¶

A rolling restart cycles through cluster nodes, restarting each one while maintaining cluster availability. This procedure is used for configuration changes, JVM updates, and routine maintenance.

Prerequisites¶

The following conditions must be met before starting a rolling restart:

Requirement	Verification
All nodes should show `UN` status	`nodetool status` (proceed with caution if some nodes are down)
No topology changes in progress	`nodetool netstats`
No active repairs	`nodetool netstats`
Schema agreement	`nodetool describecluster`

# Pre-flight verification
nodetool status              # All nodes UN
nodetool describecluster     # Single schema version
nodetool netstats            # No active streaming

AxonOps performs these verification checks automatically before initiating any rolling restart operation.

Behavioral Contract¶

Guarantees¶

Cluster remains available throughout the restart process
No data loss occurs if procedure is followed correctly
Client requests continue to be served (with potential latency increase)

Consistency Requirements¶

For the cluster to maintain consistency during restarts:

Consistency Level	Minimum Nodes Required	Restart Constraint
ONE	1 node up	N-1 nodes may be down
QUORUM (RF=3)	2 nodes up	1 node may be down
LOCAL_QUORUM (RF=3)	2 nodes up per DC	1 node per DC may be down
ALL	All nodes up	No restarts possible

Node-by-Node Restart¶

The standard approach for rolling restarts.

Procedure¶

Step 1: Select restart order

Restart order considerations:

Approach	Use Case
Rack-by-rack	Maintains rack fault tolerance
Any order	Acceptable when only one node is down at a time

Seed Nodes

Seed nodes do not require special treatment during rolling restarts. Seeds are only used during bootstrap for initial cluster discovery. Once nodes have joined the cluster, they maintain topology information through gossip and do not depend on seeds.

Step 2: Verify all other nodes are UN

Before taking any node down, confirm all other nodes are up and running:

nodetool status

All nodes except the one being restarted must show UN (Up, Normal). If any node shows DN or another state, resolve that issue before proceeding. Taking down a node while another is already down may violate consistency requirements.

Step 3: Drain the node

Before stopping, drain pending writes to disk:

nodetool drain

This flushes memtables and stops accepting writes. The node becomes unresponsive to clients.

Step 4: Stop Cassandra

sudo systemctl stop cassandra

Step 5: Make configuration changes (if applicable)

# Edit cassandra.yaml, jvm.options, etc.
sudo vim /etc/cassandra/cassandra.yaml

Step 6: Start Cassandra

sudo systemctl start cassandra

Step 7: Wait for node to rejoin

# Wait for UN status
watch -n 5 'nodetool status'

# Verify node is serving requests
nodetool info

Step 8: Wait for hint delivery

While the node was down, other nodes accumulated hints for it. These must be delivered before proceeding:

# Check hint delivery progress on OTHER nodes
nodetool tpstats | grep HintedHandoff

# Example output showing pending hints:
# HintedHandoff  0         0         0         0         0
#                Active    Pending   Completed Blocked   All time blocked

Wait until Pending column shows 0 on all nodes.

# Alternative: check hints directory size
du -sh /var/lib/cassandra/hints/

Hint Delivery Time

Hint delivery typically completes within seconds for short outages. For longer outages or high write volumes, delivery may take several minutes. Proceeding before hints are delivered risks hint accumulation across multiple nodes. AxonOps monitors hint queue depth across all nodes and waits for delivery completion before proceeding.

Step 9: Verify stability before proceeding

# Check for pending compactions
nodetool compactionstats

# Verify gossip is stable
nodetool gossipinfo | grep STATUS

Wait for the node to stabilize (typically 1-2 minutes) before proceeding to the next node.

Step 10: Repeat for remaining nodes

Continue with each node until all have been restarted. Return to Step 2 for each subsequent node.

Wait Time Between Restarts¶

Cluster Size	Recommended Wait
< 10 nodes	Until node shows UN
10-50 nodes	UN + 1-2 minutes
50+ nodes	UN + 2-5 minutes

AxonOps automatically calculates and enforces appropriate wait times based on cluster size and current load metrics.

Rack-Aware Restart¶

When racks are configured to match the replication factor, entire racks may be restarted simultaneously while maintaining consistency.

Rack-Level Restart Prerequisites¶

This approach requires:

Requirement	Example
Racks ≥ RF	RF=3 with 3 racks
Even node distribution	Same node count per rack
NetworkTopologyStrategy	Ensures one replica per rack

Why This Works¶

With RF=3 and 3 racks, Cassandra places exactly one replica in each rack:

Token Range X:
├── Replica 1 → Rack1 (Node A)
├── Replica 2 → Rack2 (Node B)
└── Replica 3 → Rack3 (Node C)

If Rack1 is completely down: - 2 replicas remain (Rack2 and Rack3) - QUORUM (2 of 3) is still achievable - All reads and writes at QUORUM succeed

Capacity Considerations¶

Capacity Impact

When an entire rack is down, remaining nodes handle 50% more load (for RF=3, 3 racks):

Normal: Each rack handles 1/3 of requests
One rack down: Each remaining rack handles 1/2 of requests
Throughput capacity must accommodate this increase

Verify capacity before rack-level restart:

# Check current load metrics
nodetool tpstats
nodetool proxyhistograms

# Ensure headroom exists for 50% load increase

Rack-Level Restart Procedure¶

Step 1: Verify all nodes are UN and identify rack to restart

nodetool status

# ALL nodes must show UN before proceeding
# Note nodes in target rack
# Datacenter: dc1
# UN  10.0.1.1  rack1  ← Target
# UN  10.0.1.2  rack1  ← Target
# UN  10.0.1.3  rack2
# UN  10.0.1.4  rack2
# UN  10.0.1.5  rack3
# UN  10.0.1.6  rack3

All nodes in other racks must show UN before taking down any rack. If any node outside the target rack is down, resolve that issue first.

Step 2: Drain all nodes in the rack

# On all rack1 nodes simultaneously
nodetool drain

Step 3: Stop all nodes in the rack

# On all rack1 nodes
sudo systemctl stop cassandra

Step 4: Make configuration changes

Apply changes to all nodes in the rack.

Step 5: Start all nodes in the rack

# On all rack1 nodes
sudo systemctl start cassandra

Step 6: Wait for rack to rejoin

# All rack1 nodes should show UN
nodetool status

Step 7: Wait for hint delivery

While the rack was down, other racks accumulated hints. Wait for delivery to complete:

# On nodes in OTHER racks, check hint delivery
nodetool tpstats | grep HintedHandoff

# Wait until Pending = 0 on all nodes

With rack-level restarts, hint volume is higher since all nodes in the rack were down simultaneously.

Step 8: Verify stability

# Ensure all nodes are serving requests
nodetool info

# Check for streaming completion
nodetool netstats

Step 9: Proceed to next rack

Wait for stability, then repeat for remaining racks.

Rack Restart Timing¶

Rack Size	Approximate Restart Time
2-3 nodes	2-5 minutes
5-10 nodes	5-10 minutes
10+ nodes	10-15 minutes

AxonOps detects rack topology automatically and can perform rack-level rolling restarts when the cluster configuration supports it.

Restart Without Drain¶

In some scenarios, drain may be skipped:

Scenario	Drain Required
Configuration change	Recommended
JVM restart	Optional
Emergency restart	Skip
Version upgrade	Required

Skipping Drain¶

If drain is skipped:

# Stop without drain
sudo systemctl stop cassandra

# Cassandra flushes memtables on shutdown signal
# Commitlog replays on next start

Trade-offs:

With Drain	Without Drain
Clean shutdown	Commitlog replay on start
Faster restart	Slightly slower restart
No replay needed	May take 1-2 minutes longer

Monitoring During Rolling Restart¶

Key Metrics¶

# On remaining nodes, monitor:

# Client request latency
nodetool proxyhistograms

# Thread pool status
nodetool tpstats

# Pending operations
nodetool compactionstats

Alerting Considerations¶

During rolling restarts:

Expect latency increase (fewer nodes serving requests)
Expect elevated load on remaining nodes
Suppress alerts for expected node-down events

AxonOps automatically suppresses node-down alerts during scheduled rolling restarts and provides a unified dashboard for monitoring restart progress across all nodes.

Troubleshooting¶

Node Won't Start After Restart¶

# Check logs for errors
tail -100 /var/log/cassandra/system.log | grep -i error

# Common causes:
# - Configuration syntax error
# - Port already in use
# - Insufficient memory

Node Slow to Rejoin¶

# Check for commitlog replay
grep -i "replaying" /var/log/cassandra/system.log

# Check for schema sync
nodetool describecluster

Cluster Unstable After Restart¶

If issues occur:

Stop the rolling restart
Wait for cluster to stabilize
Investigate the issue
Resume from the last successful node

AxonOps tracks restart progress and allows operations to be paused, investigated, and resumed from any point.

Best Practices¶

Do¶

Practice	Rationale
Verify cluster health before starting	Avoid compounding issues
Wait between nodes	Allow stabilization
Monitor throughout	Catch issues early
Restart during low-traffic periods	Minimize client impact
Keep configuration changes minimal	Easier troubleshooting

Don't¶

Anti-Pattern	Risk
Restart multiple non-rack-aligned nodes	May break QUORUM
Skip health checks between nodes	Miss cascading failures
Rush through restarts	Cluster instability
Restart during repairs	Repair failures

AxonOps enforces these best practices automatically, preventing operators from accidentally violating safety constraints during rolling restart operations.

AxonOps Rolling Restart¶

AxonOps provides automated rolling restart with built-in safety checks:

Pre-flight validation: Verifies cluster health before each node
Automatic pacing: Waits for node stability and hint delivery before proceeding
Progress tracking: Visual status of restart progress across all nodes
Abort capability: Stop at any point if issues arise, resume later
Rack awareness: Respects rack topology constraints automatically
Scheduling: Schedule rolling restarts during maintenance windows
Configuration deployment: Push configuration changes to nodes as part of the restart

See AxonOps Operations for configuration details.

Cluster Management Overview - Operation selection
Adding Nodes - Node bootstrap procedures
Maintenance - General maintenance procedures