Skip to content
Maintained by AxonOps — production-grade documentation from engineers who operate distributed databases at scale

Driver Policies

Driver policies control how the application interacts with the Cassandra cluster during normal operation and failure scenarios. These policies are the primary mechanism through which developers configure failure handling behavior.


Developer Responsibility for Failure Handling

Unlike traditional databases where failure handling is largely abstracted away, Cassandra drivers expose failure scenarios directly to the application. The developer is responsible for configuring appropriate responses to failures.

This design is intentional: Cassandra's distributed architecture means that "failure" is nuanced. A node being slow is different from a node being down. A write timeout does not mean the write failed—it may have succeeded on some replicas. The driver cannot make assumptions about what the application considers acceptable behavior.

Failure Type What Happened Driver's Question Developer Must Decide
Read timeout Some replicas didn't respond in time Retry or fail? Is stale data acceptable? Retry elsewhere?
Write timeout Coordinator didn't get enough acknowledgments Retry or fail? Is duplicate write acceptable? Is operation idempotent?
Unavailable Not enough replicas alive to satisfy CL Retry or fail? Lower consistency acceptable? Wait and retry?
Node down Node unreachable Where to route? When to retry connection? Failover strategy? Recovery timing?

Default policies exist but are generic. Production applications must evaluate each policy against their specific requirements for consistency, latency, and availability.


Policy Overview

Policy Question It Answers Default Behavior
Load Balancing Which node should handle this request? Round-robin across local datacenter, token-aware
Retry Should a failed request be retried? Retry read timeouts once, don't retry write timeouts
Reconnection How quickly to reconnect after node failure? Exponential backoff (driver-specific defaults)
Speculative Execution Should redundant requests be sent? Disabled

Default Policy Behavior

Understanding default behavior is essential before customizing policies.

Java Driver Defaults (v4.x)

Policy Configuration Behavior
Load Balancing basic.load-balancing-policy Token-aware, prefers local DC, round-robin within replicas
Retry DefaultRetryPolicy Retry read timeout if enough replicas responded; never retry write timeout
Reconnection ExponentialReconnectionPolicy Base: 1 second, Max: 60 seconds (verify for specific version)
Speculative Execution None Disabled—must explicitly enable

Python Driver Defaults

Policy Default Implementation Behavior
Load Balancing TokenAwarePolicy(DCAwareRoundRobinPolicy()) Token-aware wrapping DC-aware round-robin
Retry RetryPolicy Retry read timeout once on same host; retry unavailable once on next host
Reconnection ExponentialReconnectionPolicy Base: 1 second, Max: 600 seconds
Speculative Execution None Disabled

Failure Scenarios

Understanding common failure scenarios helps in selecting appropriate policies.

Scenario 1: Single Node Failure

uml diagram

Policy involvement:

  • Load Balancing: Provides fallback nodes when primary fails
  • Retry: Determines if connection failure triggers retry
  • Reconnection: Schedules background reconnection to Node1

Scenario 2: Read Timeout (Partial Response)

uml diagram

Policy involvement:

  • Retry: Decides whether to retry based on how many replicas responded
  • Speculative Execution: Could have sent parallel request to avoid timeout

Scenario 3: Write Timeout (Dangerous)

uml diagram

Critical consideration: Write may have succeeded on R2 but acknowledgment was lost. Retrying non-idempotent writes risks data corruption.

Scenario 4: Network Partition

uml diagram

Policy involvement:

  • Load Balancing: Must route only to reachable nodes
  • Reconnection: Attempts to reconnect to partitioned nodes
  • Retry: Unavailable exceptions if CL cannot be met with reachable nodes

Multi-Datacenter Configuration

Multi-DC deployments require careful policy configuration to ensure correct behavior during normal operation and DC failures.

Local Datacenter Configuration

Always configure the local datacenter explicitly. This is the most critical setting for multi-DC deployments.

// Java - REQUIRED for multi-DC
CqlSession session = CqlSession.builder()
    .withLocalDatacenter("dc1")
    .build();
# Python - REQUIRED for multi-DC
from cassandra.policies import DCAwareRoundRobinPolicy
cluster = Cluster(
    load_balancing_policy=DCAwareRoundRobinPolicy(local_dc='dc1')
)

Multi-DC Request Routing

uml diagram

DC Failover Behavior

Configuration Normal Operation Local DC Down
LOCAL_QUORUM + local DC only Routes to local DC All requests fail
LOCAL_QUORUM + remote DC allowed Routes to local DC Still fails (LOCAL_* CLs only count local replicas)
QUORUM + remote DC allowed May route anywhere Continues if global quorum available

Multi-DC Policy Configuration

Configure multi-DC failover via application.conf:

datastax-java-driver {
  basic.load-balancing-policy {
    local-datacenter = "dc1"
  }
  advanced.load-balancing-policy {
    # Allow failover to remote DC (requires non-LOCAL consistency levels)
    dc-failover.max-nodes-per-remote-dc = 2
  }
}

LOCAL consistency levels

Allowing remote DC nodes in the load balancing policy does not enable failover for LOCAL_* consistency levels. These levels only count replicas in the coordinator's datacenter regardless of where the request originates.

Consistency Level Implications

Consistency Level Multi-DC Behavior DC Failure Impact
LOCAL_ONE Local DC only Fails if local DC down
LOCAL_QUORUM Local DC only Fails if local DC down
QUORUM Global quorum May succeed with one DC down
EACH_QUORUM Quorum in every DC Fails if any DC down
ALL Every replica Fails if any node down

Recommendation: Use LOCAL_QUORUM for most operations. Configure load balancer to allow remote DC failover only when acceptable for the use case.


Why Policies Matter

Default policies are designed for general use cases but may not match specific application requirements:

Load Balancing Examples

Scenario Default Behavior Problem
Multi-DC deployment May route to remote DC High latency if local DC not configured
Heterogeneous hardware Equal distribution Overloads weaker nodes
Batch analytics Token-aware routing Optimal for OLTP, but analytics may prefer round-robin

Retry Examples

Scenario Default Behavior Problem
Non-idempotent writes May retry on timeout Potential duplicate writes
Overloaded cluster Retry immediately Amplifies load, worsens situation
Read timeout Retry same node Node may still be slow

Reconnection Examples

Scenario Default Behavior Problem
Brief network blip Exponential backoff Slow recovery for transient issues
Node replacement Standard reconnection May attempt reconnection to decommissioned node
Rolling restart Backoff after each node Cascading delays

Policy Interactions

Policies do not operate in isolation—they interact during request execution:

uml diagram

If speculative execution is enabled, requests are sent concurrently:

uml diagram


Configuration Approach

Explicit Configuration

Do not rely on defaults for production deployments. Configure each policy explicitly via application.conf (preferred) or programmatically:

# application.conf - Recommended approach for Java Driver 4.x
datastax-java-driver {
  basic {
    load-balancing-policy.local-datacenter = "dc1"
  }
  advanced {
    reconnection-policy {
      class = ExponentialReconnectionPolicy
      base-delay = 1 second
      max-delay = 5 minutes
    }
    speculative-execution-policy {
      class = ConstantSpeculativeExecutionPolicy
      max-executions = 2
      delay = 100 milliseconds
    }
  }
}
// Load configuration from application.conf
CqlSession session = CqlSession.builder()
    .withConfigLoader(DriverConfigLoader.fromClasspath("application.conf"))
    .build();

Per-Statement Override

Some policies can be overridden per statement:

// Override retry policy for specific query
Statement statement = SimpleStatement.builder("SELECT * FROM users WHERE id = ?")
    .addPositionalValue(userId)
    .setRetryPolicy(FallthroughRetryPolicy.INSTANCE)  // No retries
    .build();

This allows different behavior for different query types (e.g., strict no-retry for non-idempotent writes).


Policy Recommendations by Use Case

Use Case Load Balancing Retry Reconnection Speculative Execution
OLTP (low latency) Token-aware, local DC Conservative (reads only) Fast base (500ms) Enable for reads
Batch/Analytics Round-robin or token-aware Aggressive retry Standard Disable
Multi-DC Active-Active Token-aware, local DC, failover enabled Per-DC retry Standard Local DC only
Write-heavy Token-aware No retry for writes Standard Disable
Read-heavy Token-aware Retry reads Standard Enable

OLTP Application Configuration

# application-oltp.conf
datastax-java-driver {
  basic.load-balancing-policy.local-datacenter = "dc1"
  advanced {
    load-balancing-policy.slow-replica-avoidance = true
    reconnection-policy {
      class = ExponentialReconnectionPolicy
      base-delay = 500 milliseconds
      max-delay = 2 minutes
    }
    speculative-execution-policy {
      class = ConstantSpeculativeExecutionPolicy
      max-executions = 2
      delay = 50 milliseconds
    }
  }
}

Multi-DC Active-Active Configuration

# application-multi-dc.conf
datastax-java-driver {
  basic.load-balancing-policy.local-datacenter = "dc1"
  advanced {
    load-balancing-policy {
      # Allow failover to remote DC (requires non-LOCAL consistency levels)
      dc-failover.max-nodes-per-remote-dc = 2
    }
    reconnection-policy {
      class = ExponentialReconnectionPolicy
      base-delay = 1 second
      max-delay = 5 minutes
    }
    # No speculative execution across DCs (latency difference too high)
  }
}

Section Contents