Rolling Restart¶
A rolling restart cycles through cluster nodes, restarting each one while maintaining cluster availability. This procedure is used for configuration changes, JVM updates, and routine maintenance.
Prerequisites¶
The following conditions must be met before starting a rolling restart:
| Requirement | Verification |
|---|---|
All nodes must show UN status |
nodetool status |
| No topology changes in progress | nodetool netstats |
| No active repairs | nodetool netstats |
| Schema agreement | nodetool describecluster |
# Pre-flight verification
nodetool status # All nodes UN
nodetool describecluster # Single schema version
nodetool netstats # No active streaming
AxonOps performs these verification checks automatically before initiating any rolling restart operation.
Behavioral Contract¶
Guarantees¶
- Cluster remains available throughout the restart process
- No data loss occurs if procedure is followed correctly
- Client requests continue to be served (with potential latency increase)
Consistency Requirements¶
For the cluster to maintain consistency during restarts:
| Consistency Level | Minimum Nodes Required | Restart Constraint |
|---|---|---|
| ONE | 1 node up | N-1 nodes may be down |
| QUORUM (RF=3) | 2 nodes up | 1 node may be down |
| LOCAL_QUORUM (RF=3) | 2 nodes up per DC | 1 node per DC may be down |
| ALL | All nodes up | No restarts possible |
Node-by-Node Restart¶
The standard approach for rolling restarts.
Procedure¶
Step 1: Select restart order
Restart order considerations:
| Approach | Use Case |
|---|---|
| Rack-by-rack | Maintains rack fault tolerance |
| Any order | Acceptable when only one node is down at a time |
Seed Nodes
Seed nodes do not require special treatment during rolling restarts. Seeds are only used during bootstrap for initial cluster discovery. Once nodes have joined the cluster, they maintain topology information through gossip and do not depend on seeds.
Step 2: Verify all other nodes are UN
Before taking any node down, confirm all other nodes are up and running:
nodetool status
All nodes except the one being restarted must show UN (Up, Normal). If any node shows DN or another state, resolve that issue before proceeding. Taking down a node while another is already down may violate consistency requirements.
Step 3: Drain the node
Before stopping, drain pending writes to disk:
nodetool drain
This flushes memtables and stops accepting writes. The node becomes unresponsive to clients.
Step 4: Stop Cassandra
sudo systemctl stop cassandra
Step 5: Make configuration changes (if applicable)
# Edit cassandra.yaml, jvm.options, etc.
sudo vim /etc/cassandra/cassandra.yaml
Step 6: Start Cassandra
sudo systemctl start cassandra
Step 7: Wait for node to rejoin
# Wait for UN status
watch -n 5 'nodetool status'
# Verify node is serving requests
nodetool info
Step 8: Wait for hint delivery
While the node was down, other nodes accumulated hints for it. These must be delivered before proceeding:
# Check hint delivery progress on OTHER nodes
nodetool tpstats | grep HintedHandoff
# Example output showing pending hints:
# HintedHandoff 0 0 0 0 0
# Active Pending Completed Blocked All time blocked
Wait until Pending column shows 0 on all nodes.
# Alternative: check hints directory size
du -sh /var/lib/cassandra/hints/
Hint Delivery Time
Hint delivery typically completes within seconds for short outages. For longer outages or high write volumes, delivery may take several minutes. Proceeding before hints are delivered risks hint accumulation across multiple nodes. AxonOps monitors hint queue depth across all nodes and waits for delivery completion before proceeding.
Step 9: Verify stability before proceeding
# Check for pending compactions
nodetool compactionstats
# Verify gossip is stable
nodetool gossipinfo | grep STATUS
Wait for the node to stabilize (typically 1-2 minutes) before proceeding to the next node.
Step 10: Repeat for remaining nodes
Continue with each node until all have been restarted. Return to Step 2 for each subsequent node.
Wait Time Between Restarts¶
| Cluster Size | Recommended Wait |
|---|---|
| < 10 nodes | Until node shows UN |
| 10-50 nodes | UN + 1-2 minutes |
| 50+ nodes | UN + 2-5 minutes |
AxonOps automatically calculates and enforces appropriate wait times based on cluster size and current load metrics.
Rack-Aware Restart¶
When racks are configured to match the replication factor, entire racks may be restarted simultaneously while maintaining consistency.
Rack-Level Restart Prerequisites¶
This approach requires:
| Requirement | Example |
|---|---|
| Racks ≥ RF | RF=3 with 3 racks |
| Even node distribution | Same node count per rack |
| NetworkTopologyStrategy | Ensures one replica per rack |
Why This Works¶
With RF=3 and 3 racks, Cassandra places exactly one replica in each rack:
Token Range X:
├── Replica 1 → Rack1 (Node A)
├── Replica 2 → Rack2 (Node B)
└── Replica 3 → Rack3 (Node C)
If Rack1 is completely down: - 2 replicas remain (Rack2 and Rack3) - QUORUM (2 of 3) is still achievable - All reads and writes at QUORUM succeed
Capacity Considerations¶
Capacity Impact
When an entire rack is down, remaining nodes handle 50% more load (for RF=3, 3 racks):
- Normal: Each rack handles 1/3 of requests
- One rack down: Each remaining rack handles 1/2 of requests
- Throughput capacity must accommodate this increase
Verify capacity before rack-level restart:
# Check current load metrics
nodetool tpstats
nodetool proxyhistograms
# Ensure headroom exists for 50% load increase
Rack-Level Restart Procedure¶
Step 1: Verify all nodes are UN and identify rack to restart
nodetool status
# ALL nodes must show UN before proceeding
# Note nodes in target rack
# Datacenter: dc1
# UN 10.0.1.1 rack1 ← Target
# UN 10.0.1.2 rack1 ← Target
# UN 10.0.1.3 rack2
# UN 10.0.1.4 rack2
# UN 10.0.1.5 rack3
# UN 10.0.1.6 rack3
All nodes in other racks must show UN before taking down any rack. If any node outside the target rack is down, resolve that issue first.
Step 2: Drain all nodes in the rack
# On all rack1 nodes simultaneously
nodetool drain
Step 3: Stop all nodes in the rack
# On all rack1 nodes
sudo systemctl stop cassandra
Step 4: Make configuration changes
Apply changes to all nodes in the rack.
Step 5: Start all nodes in the rack
# On all rack1 nodes
sudo systemctl start cassandra
Step 6: Wait for rack to rejoin
# All rack1 nodes should show UN
nodetool status
Step 7: Wait for hint delivery
While the rack was down, other racks accumulated hints. Wait for delivery to complete:
# On nodes in OTHER racks, check hint delivery
nodetool tpstats | grep HintedHandoff
# Wait until Pending = 0 on all nodes
With rack-level restarts, hint volume is higher since all nodes in the rack were down simultaneously.
Step 8: Verify stability
# Ensure all nodes are serving requests
nodetool info
# Check for streaming completion
nodetool netstats
Step 9: Proceed to next rack
Wait for stability, then repeat for remaining racks.
Rack Restart Timing¶
| Rack Size | Approximate Restart Time |
|---|---|
| 2-3 nodes | 2-5 minutes |
| 5-10 nodes | 5-10 minutes |
| 10+ nodes | 10-15 minutes |
AxonOps detects rack topology automatically and can perform rack-level rolling restarts when the cluster configuration supports it.
Restart Without Drain¶
In some scenarios, drain may be skipped:
| Scenario | Drain Required |
|---|---|
| Configuration change | Recommended |
| JVM restart | Optional |
| Emergency restart | Skip |
| Version upgrade | Required |
Skipping Drain¶
If drain is skipped:
# Stop without drain
sudo systemctl stop cassandra
# Cassandra flushes memtables on shutdown signal
# Commitlog replays on next start
Trade-offs:
| With Drain | Without Drain |
|---|---|
| Clean shutdown | Commitlog replay on start |
| Faster restart | Slightly slower restart |
| No replay needed | May take 1-2 minutes longer |
Monitoring During Rolling Restart¶
Key Metrics¶
# On remaining nodes, monitor:
# Client request latency
nodetool proxyhistograms
# Thread pool status
nodetool tpstats
# Pending operations
nodetool compactionstats
Alerting Considerations¶
During rolling restarts:
- Expect latency increase (fewer nodes serving requests)
- Expect elevated load on remaining nodes
- Suppress alerts for expected node-down events
AxonOps automatically suppresses node-down alerts during scheduled rolling restarts and provides a unified dashboard for monitoring restart progress across all nodes.
Troubleshooting¶
Node Won't Start After Restart¶
# Check logs for errors
tail -100 /var/log/cassandra/system.log | grep -i error
# Common causes:
# - Configuration syntax error
# - Port already in use
# - Insufficient memory
Node Slow to Rejoin¶
# Check for commitlog replay
grep -i "replaying" /var/log/cassandra/system.log
# Check for schema sync
nodetool describecluster
Cluster Unstable After Restart¶
If issues occur:
- Stop the rolling restart
- Wait for cluster to stabilize
- Investigate the issue
- Resume from the last successful node
AxonOps tracks restart progress and allows operations to be paused, investigated, and resumed from any point.
Best Practices¶
Do¶
| Practice | Rationale |
|---|---|
| Verify cluster health before starting | Avoid compounding issues |
| Wait between nodes | Allow stabilization |
| Monitor throughout | Catch issues early |
| Restart during low-traffic periods | Minimize client impact |
| Keep configuration changes minimal | Easier troubleshooting |
Don't¶
| Anti-Pattern | Risk |
|---|---|
| Restart multiple non-rack-aligned nodes | May break QUORUM |
| Skip health checks between nodes | Miss cascading failures |
| Rush through restarts | Cluster instability |
| Restart during repairs | Repair failures |
AxonOps enforces these best practices automatically, preventing operators from accidentally violating safety constraints during rolling restart operations.
AxonOps Rolling Restart¶
AxonOps provides automated rolling restart with built-in safety checks:
- Pre-flight validation: Verifies cluster health before each node
- Automatic pacing: Waits for node stability and hint delivery before proceeding
- Progress tracking: Visual status of restart progress across all nodes
- Abort capability: Stop at any point if issues arise, resume later
- Rack awareness: Respects rack topology constraints automatically
- Scheduling: Schedule rolling restarts during maintenance windows
- Configuration deployment: Push configuration changes to nodes as part of the restart
See AxonOps Operations for configuration details.
Related Documentation¶
- Cluster Management Overview - Operation selection
- Adding Nodes - Node bootstrap procedures
- Maintenance - General maintenance procedures