Skip to content
Maintained by AxonOps — production-grade documentation from engineers who operate distributed databases at scale

AxonOps Table Dashboard Metrics Mapping

This document maps the metrics used in the AxonOps Table dashboard.

Dashboard Overview

The Table dashboard provides comprehensive table-level metrics including coordinator and replica statistics, latency metrics, throughput, data distribution, and performance indicators. It helps identify table-specific issues and optimization opportunities.

Metrics Mapping

Data Distribution Metrics

Dashboard Metric Description Attributes
cas_Table_LiveDiskSpaceUsed Live data size per table keyspace, scope (table), function=Count, dc, rack, host_id

Coordinator Metrics (Table-level)

Dashboard Metric Description Attributes
cas_Table_CoordinatorReadLatency Read latency at coordinator level keyspace, scope (table), function (percentiles/Count), axonfunction (rate), dc, rack, host_id
cas_Table_CoordinatorWriteLatency Write latency at coordinator level keyspace, scope (table), function (percentiles/Count), axonfunction (rate), dc, rack, host_id
cas_Table_CoordinatorScanLatency Range scan latency at coordinator keyspace, scope (table), function (percentiles/Count), axonfunction (rate), dc, rack, host_id

Replica Metrics (Local)

Dashboard Metric Description Attributes
cas_Table_ReadLatency Local read latency keyspace, scope (table), function (percentiles/Count), axonfunction (rate), dc, rack, host_id
cas_Table_WriteLatency Local write latency keyspace, scope (table), function (percentiles/Count), axonfunction (rate), dc, rack, host_id
cas_Table_RangeLatency Local range query latency keyspace, scope (table), function (percentiles/Count), axonfunction (rate), dc, rack, host_id

Partition and SSTable Metrics

Dashboard Metric Description Attributes
cas_Table_MeanPartitionSize Average partition size keyspace, scope (table), dc, rack, host_id
cas_Table_MaxPartitionSize Maximum partition size keyspace, scope (table), dc, rack, host_id
cas_Table_EstimatedPartitionCount Estimated number of partitions keyspace, scope (table), dc, rack, host_id
cas_Table_LiveSSTableCount Number of live SSTables keyspace, scope (table), dc, rack, host_id
cas_Table_SSTablesPerReadHistogram SSTables accessed per read keyspace, scope (table), function (percentiles), dc, rack, host_id

Performance Indicators

Dashboard Metric Description Attributes
cas_Table_TombstoneScannedHistogram Tombstones scanned per query keyspace, scope (table), function (percentiles), dc, rack, host_id
cas_Table_SpeculativeRetries Speculative retry attempts keyspace, scope (table), function=Count, axonfunction (rate), dc, rack, host_id
cas_Table_BloomFilterFalseRatio Bloom filter false positive ratio keyspace, scope (table), dc, rack, host_id
cas_Table_BloomFilterDiskSpaceUsed Disk space used by bloom filters keyspace, scope (table), dc, rack, host_id

Memory Metrics

Dashboard Metric Description Attributes
cas_Table_AllMemtablesHeapSize Heap memory used by memtables keyspace, scope (table), dc, rack, host_id
cas_Table_AllMemtablesOffHeapSize Off-heap memory used by memtables keyspace, scope (table), dc, rack, host_id

JVM Garbage Collection Metrics

Dashboard Metric Description Attributes
jvm_GarbageCollector_G1_Young_Generation G1 young generation GC function (CollectionTime/CollectionCount), axonfunction (rate), dc, rack, host_id
jvm_GarbageCollector_Shenandoah_Cycles Shenandoah GC cycles function (CollectionCount), axonfunction (rate), dc, rack, host_id
jvm_GarbageCollector_Shenandoah_Pauses Shenandoah GC pauses function (CollectionTime), axonfunction (rate), dc, rack, host_id
jvm_GarbageCollector_ZGC ZGC garbage collector function (CollectionTime/CollectionCount), axonfunction (rate), dc, rack, host_id

Query Examples

Tables Overview Section

// Table Data Size Distribution (Pie Chart)
sum by (keyspace,scope) (cas_Table_LiveDiskSpaceUsed{function='Count',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'})

// Coordinator Table Reads Distribution (Pie Chart)
sum by (keyspace,scope) (cas_Table_CoordinatorReadLatency{axonfunction='rate',dc=~'$dc',rack=~'$rack',host_id=~'$host_id',function='Count'})

// Coordinator Table Writes Distribution (Pie Chart)
sum by (keyspace, scope) (cas_Table_CoordinatorWriteLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',axonfunction='rate',function=~'Count'})

Coordinator Table Statistics

// Max Coordinator Read Latency
max(cas_Table_CoordinatorReadLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',function='$percentile',keyspace=~'$keyspace',scope=~'$scope'}) by ($groupBy,keyspace,scope)

// Max Coordinator Write Latency
max(cas_Table_CoordinatorWriteLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',function='$percentile',keyspace=~'$keyspace',scope=~'$scope'}) by ($groupBy,keyspace,scope)

// Max Coordinator Range Read Latency
max(cas_Table_CoordinatorScanLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',function=~'$percentile',keyspace=~'$keyspace',scope=~'$scope'}) by ($groupBy,keyspace,scope)

// Total Coordinator Reads/sec
sum (cas_Table_CoordinatorReadLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',axonfunction='rate',function=~'Count',keyspace=~'$keyspace',scope=~'$scope'}) by ($groupBy,keyspace,scope)

// Average Coordinator Range Reads/sec
avg(cas_Table_CoordinatorScanLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',axonfunction='rate',function=~'Count',keyspace=~'$keyspace',scope=~'$scope'}) by ($groupBy,keyspace,scope)

// Total Coordinator Writes/sec
sum (cas_Table_CoordinatorWriteLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',axonfunction='rate',function='Count',keyspace=~'$keyspace',scope=~'$scope'}) by ($groupBy,keyspace,scope)

Table Replica Statistics

// Average Replica Read Latency
avg(cas_Table_ReadLatency{scope=~'$scope',scope!='',keyspace=~'$keyspace',function='$percentile',function!='Min|Max',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy,keyspace,scope)

// Average Replica Range Read Latency
avg(cas_Table_RangeLatency{scope=~'$scope',scope!='',keyspace=~'$keyspace',function='$percentile',function!='Min|Max',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy,keyspace,scope)

// Average Replica Write Latency
avg(cas_Table_WriteLatency{scope=~'$scope',scope!='',keyspace=~'$keyspace',function='$percentile',function!='Min|Max',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy,keyspace,scope)

// Total Replica Reads/sec
sum(cas_Table_ReadLatency{axonfunction='rate',scope=~'$scope',scope!='',keyspace=~'$keyspace',function='Count',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy,keyspace,scope)

// Total Replica Range Reads/sec
sum(cas_Table_RangeLatency{axonfunction='rate',scope=~'$scope',scope!='',keyspace=~'$keyspace',function='Count',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy,keyspace,scope)

// Total Replica Writes/sec
sum(cas_Table_WriteLatency{axonfunction='rate',scope=~'$scope',scope!='',keyspace=~'$keyspace',function='Count',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy,keyspace,scope)

Latency Causes

// Average Mean Partition Size
avg(cas_Table_MeanPartitionSize{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',scope=~'$scope',scope!=''}) by ($groupBy,keyspace,scope)

// Total Estimated Partitions
sum(cas_Table_EstimatedPartitionCount{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',keyspace='$keyspace',scope='$scope'}) by ($groupBy,keyspace,scope)

// Max Partition Size
max(cas_Table_MaxPartitionSize{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',scope=~'$scope',scope!=''}) by ($groupBy,keyspace,scope)

// SSTables Per Read
cas_Table_SSTablesPerReadHistogram{scope=~'$scope',scope!='',keyspace=~'$keyspace',function='$percentile',function!='Min|Max',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}

// Max Live SSTables
max(cas_Table_LiveSSTableCount{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',scope=~'$scope',scope!=''}) by ($groupBy,keyspace,scope)

// Tombstones Scanned
cas_Table_TombstoneScannedHistogram{scope=~'$scope',scope!='',keyspace=~'$keyspace',function='$percentile',function!='Count|Min|Max',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}

// Speculative Retries
cas_Table_SpeculativeRetries{axonfunction='rate',scope=~'$scope',scope!='',keyspace=~'$keyspace',function='Count',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}

// Bloom Filter False Ratio
cas_Table_BloomFilterFalseRatio{scope=~'$scope',scope!='',keyspace=~'$keyspace',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}

// Max Bloom Filter Disk
max(cas_Table_BloomFilterDiskSpaceUsed{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',keyspace='$keyspace',scope='$scope'}) by ($groupBy,keyspace,scope)

Memory Statistics

// Total Table Heap Memory
sum(cas_Table_AllMemtablesHeapSize{dc=~'$dc',rack='$rack',host_id=~'$host_id',keyspace='$keyspace',scope='$scope'}) by ($groupBy,keyspace,scope)

// Total Table Off-Heap Memory
sum(cas_Table_AllMemtablesOffHeapSize{dc=~'$dc',rack='$rack',host_id=~'$host_id',keyspace='$keyspace',scope='$scope'}) by ($groupBy,keyspace,scope)

// GC Duration - G1 YoungGen
jvm_GarbageCollector_G1_Young_Generation{axonfunction='rate',function='CollectionTime',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}

// GC Count per sec - G1 YoungGen
jvm_GarbageCollector_G1_Young_Generation{axonfunction='rate',function='CollectionCount',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}

Panel Organization

Tables Overview

  • Table Data Size % Distribution - Pie chart showing relative data size per table

  • Coordinator Table Reads % Distribution - Read request distribution by table

  • Coordinator Table Writes % Distribution - Write request distribution by table

Coordinator Table Statistics

  • Max Coordinator Table Read Latency - Maximum read latency at coordinator

  • Max Coordinator Table Write Latency - Maximum write latency at coordinator

  • Max Coordinator Table Range Read Latency - Maximum range query latency

  • Total Coordinator Table Reads/Sec - Read throughput at coordinator

  • Average Coordinator Table Range Reads/Sec - Range query throughput

  • Total Coordinator Table Writes/Sec - Write throughput at coordinator

Table Replica Statistics

  • Average Replica Read Latency - Local read latency

  • Average Replica Range Read Latency - Local range query latency

  • Average Replica Write Latency - Local write latency

  • Total Replica Reads/sec - Local read throughput

  • Total Replica Table Range Reads/sec - Local range query throughput

  • Total Replica Writes/sec - Local write throughput

Latency Causes

  • Average Mean Table Partition Size - Average partition size indicator

  • Total Estimated Table Partitions Count - Partition count per table

  • Max Table Partition Size - Largest partition (hotspot indicator)

  • SSTables Per Read - SSTable access efficiency

  • Max Live SSTables per Table - SSTable count (compaction indicator)

  • Tombstones Scanned per Table - Tombstone impact on reads

  • SpeculativeRetries By Node For Table Reads - Retry attempts

  • Bloom Filter False Positive Ratio - Filter efficiency

  • Max Table Bloom Filter Disk - Bloom filter storage

Memory Statistics

  • Total Table Heap Memory - Memtable heap usage

  • Total Table Off-Heap Memory - Memtable off-heap usage

  • GC duration - G1 YoungGen - Young generation GC time

  • GC count per sec - G1 YoungGen - Young generation GC frequency

  • GC duration - Shenandoah - Shenandoah GC time

  • GC Count per sec - Shenandoah - Shenandoah GC frequency

  • GC duration - ZGC - ZGC time

  • GC Count per sec - ZGC - ZGC frequency

Filters

  • data center (dc) - Filter by data center

  • rack - Filter by rack

  • node (host_id) - Filter by specific node

  • groupBy - Dynamic grouping (dc, rack, host_id)

  • percentile - Select latency percentile (50th, 75th, 95th, 98th, 99th, 999th)

  • keyspace - Filter by keyspace

  • table (scope) - Filter by table

Understanding Table Metrics

Coordinator vs Replica Metrics

  • Coordinator: Metrics from the node coordinating the request

  • Replica: Metrics from nodes storing the data

  • Coordinator latency includes network and replica time

  • Replica latency is local operation time only

Performance Indicators

Partition Size:

  • Large partitions (>100MB) cause performance issues
  • Monitor max size for hotspot detection
  • Mean size indicates data distribution

SSTable Metrics:

  • High SSTable count impacts read performance
  • More SSTables = more files to check
  • Indicates compaction strategy effectiveness

Tombstones:

  • Deleted data markers
  • High tombstone counts slow reads
  • Indicates need for compaction or TTL review

Bloom Filters:

  • Probabilistic data structure for SSTable lookups
  • False positive ratio should be <0.1
  • Higher ratios mean unnecessary SSTable reads

Speculative Retries:

  • Proactive retry mechanism for slow reads
  • High rates indicate inconsistent performance
  • May need tuning or investigation

Best Practices

Table Design

  • Partition Size: Keep <100MB, ideally <10MB

  • Even Distribution: Avoid hotspots

  • Appropriate TTL: Manage tombstone creation

  • Compression: Choose based on workload

Performance Monitoring

Latency Percentiles:

  • p50: Median performance
  • p99: Tail latency
  • Large p50-p99 gap indicates issues

Throughput Balance:

  • Even distribution across tables
  • Identify heavy tables
  • Plan capacity accordingly

Resource Usage:

  • Monitor memory per table
  • Track GC impact
  • Balance heap/off-heap usage

Troubleshooting

High Latency:

  • Check partition sizes
  • Review SSTable counts
  • Monitor tombstones
  • Verify bloom filter efficiency

Memory Issues:

  • Check memtable sizes
  • Monitor GC frequency
  • Review flush thresholds

Throughput Problems:

  • Analyze coordinator distribution
  • Check speculative retries
  • Review consistency levels

Units and Display

  • Latency: microseconds

  • Throughput: ops/sec (rps/wps)

  • Size: bytes (binary units)

  • Ratio: decimal/percentage

  • Count: short (absolute numbers)

  • GC: milliseconds/count per sec

Legend Format:

  • Overview: $keyspace $scope
  • Details: $groupBy - $keyspace $scope
  • Node-specific: $dc - $host_id

Notes

  • The scope!='' filter excludes empty table names
  • function!='Min|Max' excludes extreme values for percentiles
  • Coordinator metrics show client-facing performance
  • Replica metrics show storage-layer performance
  • GC metrics help correlate latency with JVM behavior
  • Some queries use special operations like diff or specific rack filtering