Multi-Datacenter Operations¶

This guide covers procedures for adding and removing datacenters in a Cassandra cluster.

Adding a Datacenter¶

Expanding to a new datacenter provides geographic redundancy and reduced latency for regional users.

Prerequisites¶

Requirement	Verification
Existing cluster healthy	All nodes `UN`
Network connectivity	New DC can reach existing DCs
Cross-DC latency acceptable	< 100ms recommended
Same Cassandra version	Match existing cluster
Hardware provisioned	Nodes ready in new DC

Planning Considerations¶

Replication strategy:

The cluster must use NetworkTopologyStrategy for multi-DC deployments:

-- Check current replication
DESCRIBE KEYSPACE my_keyspace;

-- Must be NetworkTopologyStrategy, not SimpleStrategy

Node count:

New datacenter should have at least RF nodes
Typically match node count of existing DC

Network requirements:

Port	Purpose
7000	Internode (gossip, streaming)
7001	Internode SSL
9042	Client connections

Procedure¶

Step 1: Update keyspace replication

Before adding any nodes, update all keyspaces to include the new datacenter:

-- User keyspaces
ALTER KEYSPACE my_keyspace WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,
    'dc2': 3  -- New datacenter
};

-- System keyspaces (critical!)
ALTER KEYSPACE system_auth WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,
    'dc2': 3
};

ALTER KEYSPACE system_distributed WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,
    'dc2': 3
};

ALTER KEYSPACE system_traces WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,
    'dc2': 3
};

Update system_auth First

The system_auth keyspace must be updated before adding nodes. Otherwise, authentication may fail for new nodes.

Step 2: Configure nodes in new datacenter

On each new node:

# cassandra.yaml

cluster_name: 'ProductionCluster'  # Must match
num_tokens: 16                     # Match existing cluster value (default is 16 for 4.0+)

# Seeds from BOTH datacenters
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
      - seeds: "dc1-node1,dc1-node2,dc2-node1"

# This node's address
listen_address: <node_ip>
rpc_address: <node_ip>

# Snitch for multi-DC
endpoint_snitch: GossipingPropertyFileSnitch

# cassandra-rackdc.properties
dc=dc2
rack=rack1

Step 3: Start nodes in new datacenter

Add nodes one at a time, but do NOT wait for bootstrap:

# New DC nodes start with auto_bootstrap: false
# They join the ring but receive no data yet

# Start first node
sudo systemctl start cassandra

# Verify it joins (shows UN but with 0 load)
nodetool status

# Start remaining nodes

Bootstrap Disabled

For new datacenter nodes, set auto_bootstrap: false. Data will be populated via rebuild, not bootstrap.

Step 4: Rebuild data in new datacenter

On each node in the new datacenter, run rebuild:

# Rebuild from existing datacenter
nodetool rebuild dc1

# This streams all data for this node's token ranges from dc1

Rebuild one node at a time to avoid overwhelming the source datacenter.

# Monitor rebuild progress
nodetool netstats

# Watch for completion
tail -f /var/log/cassandra/system.log | grep -i rebuild

Step 5: Verify completion

# All nodes UN with data
nodetool status

# Example output:
# Datacenter: dc1
# UN  10.0.1.1  245.5 GB  16  ...
# UN  10.0.1.2  238.2 GB  16  ...
# UN  10.0.1.3  251.8 GB  16  ...
#
# Datacenter: dc2
# UN  10.0.2.1  243.1 GB  16  ...  <-- Data present
# UN  10.0.2.2  240.7 GB  16  ...
# UN  10.0.2.3  248.3 GB  16  ...

Duration Estimates¶

Data to Rebuild	Per Node
100 GB	1-2 hours
500 GB	4-8 hours
1 TB	8-16 hours

Total time = Per node time × Number of nodes in new DC (sequential).

Removing a Datacenter¶

Consolidating datacenters by removing one entirely.

Prerequisites¶

Requirement	Verification
All nodes in DC healthy	`nodetool status` shows UN
Data replicated elsewhere	Other DCs have copies
No clients using removed DC	Traffic shifted away
No LOCAL_* consistency from removed DC	Clients updated

Data Loss Risk

Ensure all data is replicated to remaining datacenters before removal. If RF in the removed DC exceeds remaining capacity, data loss occurs.

Procedure¶

Step 1: Redirect client traffic

Update clients to no longer contact the datacenter being removed:

// Java driver - update contact points
CqlSession session = CqlSession.builder()
    .addContactPoint(new InetSocketAddress("dc1-node1", 9042))
    .addContactPoint(new InetSocketAddress("dc1-node2", 9042))
    // Remove dc2 contact points
    .withLocalDatacenter("dc1")
    .build();

Step 2: Update keyspace replication

Remove the datacenter from all keyspaces:

-- User keyspaces
ALTER KEYSPACE my_keyspace WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3
    -- dc2 removed
};

-- System keyspaces
ALTER KEYSPACE system_auth WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3
};

ALTER KEYSPACE system_distributed WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3
};

ALTER KEYSPACE system_traces WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3
};

Step 3: Decommission all nodes in the datacenter

Remove nodes one at a time:

# On each node in dc2
nodetool decommission

# Wait for completion before starting next

Step 4: Verify removal

# dc2 should not appear
nodetool status

# Only dc1 remains
# Datacenter: dc1
# UN  10.0.1.1  245.5 GB  256  ...
# UN  10.0.1.2  238.2 GB  256  ...
# UN  10.0.1.3  251.8 GB  256  ...

Step 5: Update seed lists

Remove dc2 seeds from all remaining nodes:

# cassandra.yaml on dc1 nodes
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
      - seeds: "dc1-node1,dc1-node2"  # dc2 seeds removed

Changing Datacenter Names¶

Renaming a datacenter requires migrating to a new DC configuration.

Procedure¶

Add nodes with new DC name (as if adding new datacenter)
Rebuild data to new DC
Update replication to include new DC name
Redirect clients to new DC name
Remove old DC

Complex Operation

DC renaming is effectively adding a new DC and removing the old one. Plan for significant downtime or accept temporary doubled hardware.

Cross-DC Consistency Considerations¶

Consistency Level Behavior¶

Consistency Level	Multi-DC Behavior
`ONE`	Satisfied by any DC
`LOCAL_ONE`	Must be satisfied in coordinator's DC
`QUORUM`	Majority across ALL DCs
`LOCAL_QUORUM`	Majority in coordinator's DC only
`EACH_QUORUM`	Majority in EACH DC
`ALL`	All replicas in ALL DCs

Recommended Settings¶

Use Case	Write CL	Read CL
Strong local consistency	`LOCAL_QUORUM`	`LOCAL_QUORUM`
Global strong consistency	`QUORUM`	`QUORUM`
Availability priority	`LOCAL_ONE`	`LOCAL_ONE`
Cross-DC reads during DC failure	`QUORUM`	`QUORUM`

LOCAL_QUORUM Recommendation

For most multi-DC deployments, LOCAL_QUORUM provides the best balance of consistency and availability. It ensures strong consistency within each DC while tolerating complete DC failure.

Troubleshooting¶

Rebuild Stalled¶

Symptoms: nodetool netstats shows no progress

# Check source DC health
nodetool status

# Check network connectivity
nc -zv dc1-node1 7000

# Check logs
grep -i "stream\|rebuild" /var/log/cassandra/system.log | tail -50

Solutions:

Verify cross-DC network connectivity
Check source DC capacity (may be overwhelmed)
Increase streaming timeouts

Authentication Failures in New DC¶

Symptoms: Nodes can't authenticate after joining

Cause: system_auth not replicated to new DC before nodes joined

Solution:

-- Update system_auth replication
ALTER KEYSPACE system_auth WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,
    'dc2': 3
};

-- Run repair on system_auth
nodetool repair system_auth

Cross-DC Latency Issues¶

Symptoms: Client requests slow when coordinator in different DC

Solutions:

Use LOCAL_QUORUM instead of QUORUM
Configure client with correct local DC:

.withLocalDatacenter("dc1")

Review cross-DC network performance

Cluster Management Overview - Operation selection
Adding Nodes - Bootstrap procedures
Removing Nodes - Decommission procedures
Consistency Levels - Multi-DC consistency

Multi-Datacenter Operations¶

Adding a Datacenter¶

Prerequisites¶

Planning Considerations¶

Procedure¶

Duration Estimates¶

Removing a Datacenter¶

Prerequisites¶

Procedure¶

Changing Datacenter Names¶

Procedure¶

Cross-DC Consistency Considerations¶

Consistency Level Behavior¶

Recommended Settings¶

Troubleshooting¶

Rebuild Stalled¶

Authentication Failures in New DC¶

Cross-DC Latency Issues¶

Related Documentation¶