Skip to content
Maintained by AxonOps — production-grade documentation from engineers who operate distributed databases at scale Get Cassandra Help Get Kafka Help

nodetool gettraceprobability

Displays the current probability of tracing CQL requests on the node.


Synopsis

nodetool [connection_options] gettraceprobability
See connection options for connection options.

Description

nodetool gettraceprobability shows the current probability (0.0 to 1.0) that any given CQL request will be traced. This setting controls Cassandra's probabilistic tracing feature, which samples a percentage of requests for detailed performance analysis.

What is Request Tracing?

Request tracing in Cassandra records detailed timing information about how a CQL query is processed across the cluster. When a request is traced, Cassandra captures:

  • Coordinator activity - Time spent parsing, planning, and coordinating the query
  • Replica communication - Time to send requests to and receive responses from replica nodes
  • Per-replica processing - What each replica did (memtable reads, SSTable reads, bloom filter checks)
  • Latency breakdown - Microsecond-level timing for each operation phase

Trace data is written to the system_traces keyspace, which contains two tables:

Table Contents
system_traces.sessions One row per traced request with summary information
system_traces.events Detailed events for each traced request

Why Probabilistic Tracing?

Tracing has significant overhead—each traced request generates multiple writes to system_traces. Enabling tracing for all requests (probability 1.0) would:

  • Increase write amplification substantially
  • Consume significant disk space
  • Impact cluster performance

Probabilistic tracing allows sampling a small percentage of requests to gather representative performance data without overwhelming the cluster. For example, with 0.001 (0.1%) probability on a cluster handling 100,000 requests/second, approximately 100 requests/second would be traced—enough for analysis without significant overhead.


Examples

Basic Usage

nodetool gettraceprobability

Sample Output

Current trace probability: 0.0

A value of 0.0 means no requests are being traced (the default).


Understanding Probability Values

Value Percentage Meaning Use Case
0.0 0% No tracing (default) Normal production operation
0.0001 0.01% 1 in 10,000 requests High-traffic production sampling
0.001 0.1% 1 in 1,000 requests Production performance monitoring
0.01 1% 1 in 100 requests Active troubleshooting
0.1 10% 1 in 10 requests Development/testing
1.0 100% All requests Brief debugging only

Performance Impact

Values above 0.01 (1%) can noticeably impact performance on busy clusters. Values of 0.1 or higher should only be used briefly during active debugging sessions or in non-production environments.


Viewing Trace Data

Once tracing is enabled and requests are sampled, trace data can be queried from system_traces:

View Recent Trace Sessions

SELECT * FROM system_traces.sessions
WHERE started_at > toTimestamp(now()) - 1h
LIMIT 10;

View Events for a Specific Trace

-- First, get a session_id from sessions table
SELECT session_id, coordinator, request, started_at, duration
FROM system_traces.sessions LIMIT 5;

-- Then query events for that session
SELECT activity, source, source_elapsed, thread
FROM system_traces.events
WHERE session_id = <session_id_from_above>;

Example Trace Output

 activity                                          | source        | source_elapsed
---------------------------------------------------+---------------+----------------
 Parsing SELECT * FROM users WHERE id = ?          | 192.168.1.101 |             52
 Preparing statement                               | 192.168.1.101 |            118
 Determining replicas for query                    | 192.168.1.101 |            156
 Sending READ message to /192.168.1.102           | 192.168.1.101 |            203
 READ message received from /192.168.1.101        | 192.168.1.102 |             45
 Executing single-partition query on users        | 192.168.1.102 |            112
 Acquiring sstable references                      | 192.168.1.102 |            158
 Bloom filter allows skipping sstable 1           | 192.168.1.102 |            201
 Partition index with 1 entries found             | 192.168.1.102 |            289
 Seeking to partition indexed section             | 192.168.1.102 |            334
 Merging memtable contents                        | 192.168.1.102 |            412
 Read 1 live rows and 0 tombstone cells           | 192.168.1.102 |            498
 Enqueuing response to /192.168.1.101             | 192.168.1.102 |            534
 Processing response from /192.168.1.102          | 192.168.1.101 |           2341
 Request complete                                  | 192.168.1.101 |           2456

Use Cases

Verify Tracing is Disabled

Before performance testing, ensure tracing isn't adding overhead:

nodetool gettraceprobability
# Should return 0.0

Check if Debugging Session is Active

Verify if someone enabled tracing for troubleshooting:

nodetool gettraceprobability
# If > 0.0, tracing is active

Audit Cluster Configuration

Include in cluster health checks:

#!/bin/bash
# Check trace probability on all nodes

for node in $(nodetool status | grep "^UN" | awk '{print $2}'); do
    prob=$(ssh "$node" 'nodetool gettraceprobability 2>/dev/null | grep -oE "[0-9]+\.[0-9]+"')
    if [ "$prob" != "0.0" ]; then
        echo "WARNING: $node has trace probability $prob"
    fi
done

Trace Probability and Performance

The relationship between trace probability and overhead:

Probability Overhead system_traces Growth Recommended Duration
0.0 None None Indefinite (default)
0.0001-0.001 Minimal Slow Days to weeks
0.001-0.01 Low Moderate Hours to days
0.01-0.1 Moderate Fast Minutes to hours
0.1-1.0 High Very fast Minutes only

Cleaning Up Trace Data

Trace data accumulates in system_traces with a default TTL of 24 hours. For extended tracing sessions, consider:

  • Lowering the TTL: ALTER TABLE system_traces.sessions WITH default_time_to_live = 3600;
  • Manually truncating: TRUNCATE system_traces.sessions; TRUNCATE system_traces.events;

Comparing with CQL TRACING

Cassandra offers two tracing mechanisms:

Feature Probabilistic Tracing CQL TRACING ON
Scope All requests cluster-wide Single cqlsh session
Control nodetool settraceprobability TRACING ON/OFF in cqlsh
Sampling Percentage-based All queries in session
Use case Production monitoring Interactive debugging
Persistence system_traces tables system_traces tables
-- CQL session-level tracing (alternative to probabilistic)
TRACING ON;
SELECT * FROM my_keyspace.my_table WHERE id = 123;
TRACING OFF;

Command Relationship
settraceprobability Set the trace probability
proxyhistograms View latency histograms
tablehistograms View per-table latency histograms