Skip to content
Maintained by AxonOps — production-grade documentation from engineers who operate distributed databases at scale

AxonOps Kafka ZooKeeper Dashboard Metrics Mapping

Overview

The Kafka ZooKeeper Dashboard monitors the health and performance of ZooKeeper ensemble used by Kafka for cluster coordination (in non-KRaft mode). It tracks connections, request latency, node statistics, and session management to ensure ZooKeeper is functioning properly.

Metrics Mapping

Dashboard Metric Description Attributes
ZooKeeper Health Metrics
zk_NumAliveConnections Number of active client connections port={port}
zk_NodeCount Total number of znodes port={port}
zk_WatchCount Total number of watches port={port}
zk_OutstandingRequests Number of queued requests port={port}
Request Latency Metrics
zk_MinRequestLatency Minimum request latency port={port}
zk_AvgRequestLatency Average request latency port={port}
zk_MaxRequestLatency Maximum request latency port={port}
Packet Metrics
zk_PacketsSent Number of packets sent port={port}
zk_PacketsReceived Number of packets received port={port}
Kafka-Reported ZooKeeper Metrics
kaf_ZooKeeperClientMetrics_ZooKeeperRequestLatencyMs ZooKeeper request latency from Kafka perspective -
kaf_SessionExpireListener_ZooKeeperExpiresPerSec Rate of ZooKeeper session expirations -
kaf_SessionExpireListener_ZooKeeperAuthFailuresPerSec Rate of ZooKeeper authentication failures -
kaf_SessionExpireListener_ZooKeeperSyncConnectsPerSec Rate of ZooKeeper connections -
kaf_SessionExpireListener_ZooKeeperDisconnectsPerSec Rate of ZooKeeper disconnections -

Query Examples

Health Check Metrics

// Alive connections
zk_NumAliveConnections{rack='$rack',host_id=~'$host_id'}

// Total znode count
sum(zk_NodeCount{host_id=~'$host_id',type='kafka',node_type='zookeeper'})

// Total watch count
sum(zk_WatchCount{host_id=~'$host_id',type='kafka',node_type='zookeeper'})

// Outstanding requests
sum(zk_OutstandingRequests{host_id=~'$host_id',type='kafka',node_type='zookeeper'})

Request Latency

// Minimum request latency
zk_MinRequestLatency{host_id=~'$host_id',type='kafka',node_type='zookeeper'}

// Average request latency
zk_AvgRequestLatency{host_id=~'$host_id',node_type='zookeeper',type='kafka'}

// Maximum request latency
zk_MaxRequestLatency{host_id=~'$host_id',node_type='zookeeper',type='kafka'}

// Kafka-reported ZooKeeper latency
kaf_ZooKeeperClientMetrics_ZooKeeperRequestLatencyMs{rack=~'$rack',host_id=~'$host_id'}

Traffic Metrics

// Packets sent rate
sum(zk_PacketsSent{host_id=~'$host_id', axonfunction='rate', type='kafka',node_type='zookeeper'})

// Packets received rate
sum(zk_PacketsReceived{host_id=~'$host_id', axonfunction='rate', type='kafka',node_type='zookeeper'})

// Znode creation rate
avg(zk_NodeCount{host_id=~'$host_id', axonfunction='rate',type='kafka',node_type='zookeeper'})

Connection Management

// Session expiration rate
kaf_SessionExpireListener_ZooKeeperExpiresPerSec{axonfunction='rate',rack=~'$rack',host_id=~'$host_id'}

// Authentication failure rate
kaf_SessionExpireListener_ZooKeeperAuthFailuresPerSec{axonfunction='rate',rack=~'$rack',host_id=~'$host_id'}

// Connection rate
kaf_SessionExpireListener_ZooKeeperSyncConnectsPerSec{axonfunction='rate',rack=~'$rack',host_id=~'$host_id'}

// Disconnection rate
kaf_SessionExpireListener_ZooKeeperDisconnectsPerSec{axonfunction='rate',rack=~'$rack',host_id=~'$host_id'}

Panel Organization

Overview Section

  • Empty row for spacing/organization

Health Check

  • Alive Connections
  • Outstanding Requests
  • Number of Watchers
  • Number of ZNodes

Request Latency

  • Packets (sent/received rates)
  • Znode Creation Rate
  • Request Latency - Minimum
  • Request Latency - Average
  • Request Latency - Maximum
  • Kafka Reported Request Latency

Connections

  • Zookeeper expired connections per sec
  • Zookeeper auth failures per sec
  • Zookeeper disconnect per sec
  • Zookeeper connections per sec

Filters

  • host_id: Filter by specific ZooKeeper node

  • rack: Filter by rack location

Best Practices

Health Monitoring

  • Monitor alive connections for capacity planning
  • Outstanding requests should remain low
  • High watch count may impact performance
  • Monitor znode count growth

Latency Analysis

  • Average latency should be below tickTime
  • High max latency indicates potential issues
  • Compare ZK-reported vs Kafka-reported latency

Connection Management

  • Monitor session expirations for client issues
  • Auth failures indicate security problems
  • High disconnect rate suggests network issues

Performance Tuning

  • Adjust tickTime based on latency requirements
  • Monitor packet rates for network saturation
  • Balance connections across ensemble members

Troubleshooting

  • High outstanding requests: Check ZK performance
  • Session expirations: Review session timeout settings
  • Auth failures: Check SASL/ACL configurations

Capacity Planning

  • Monitor znode growth rate
  • Track connection count trends
  • Plan for watch count scaling

ZooKeeper Ensemble Health

  • Ensure all ensemble members are responsive
  • Monitor for leader elections
  • Check fsync latency on ZK data directory