Kafka Monitoring
Comprehensive monitoring guide for Apache Kafka clusters.
Monitoring Overview

Key Metrics Categories
Cluster Health
| Metric |
Description |
Alert Threshold |
kafka.controller:ActiveControllerCount |
Active controller count |
≠ 1 |
kafka.server:UnderReplicatedPartitions |
Under-replicated partitions |
> 0 |
kafka.controller:OfflinePartitionsCount |
Offline partitions |
> 0 |
kafka.server:UnderMinIsrPartitionCount |
Below min ISR |
> 0 |
Throughput
| Metric |
Description |
Notes |
kafka.server:MessagesInPerSec |
Messages per second |
Per broker/topic |
kafka.server:BytesInPerSec |
Bytes in per second |
Per broker/topic |
kafka.server:BytesOutPerSec |
Bytes out per second |
Per broker/topic |
kafka.server:TotalProduceRequestsPerSec |
Produce requests |
Per broker |
kafka.server:TotalFetchRequestsPerSec |
Fetch requests |
Per broker |
Latency
| Metric |
Description |
Alert Threshold |
kafka.network:TotalTimeMs,request=Produce |
Produce latency |
P99 > 100ms |
kafka.network:TotalTimeMs,request=FetchConsumer |
Fetch latency |
P99 > 100ms |
kafka.network:RequestQueueTimeMs |
Queue time |
> 10ms |
kafka.network:ResponseQueueTimeMs |
Response queue time |
> 10ms |
Consumer Lag
| Metric |
Description |
Alert Threshold |
| Consumer lag |
Records behind |
Growing continuously |
| Lag growth rate |
Lag increase rate |
Positive for extended period |
JMX Configuration
Enable JMX
# Broker startup
export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.port=9999"
bin/kafka-server-start.sh config/server.properties
JMX Exporter
# jmx-exporter.yml
lowercaseOutputName: true
lowercaseOutputLabelNames: true
rules:
# Broker metrics
- pattern: kafka.server<type=(.+), name=(.+), topic=(.+)><>Count
name: kafka_server_$1_$2_total
labels:
topic: "$3"
type: COUNTER
- pattern: kafka.server<type=(.+), name=(.+)><>Count
name: kafka_server_$1_$2_total
type: COUNTER
# Request metrics
- pattern: kafka.network<type=RequestMetrics, name=(.+), request=(.+)><>Count
name: kafka_network_request_$1_total
labels:
request: "$2"
type: COUNTER
- pattern: kafka.network<type=RequestMetrics, name=(.+)Percentile, request=(.+)><>(\d+)thPercentile
name: kafka_network_request_$1_percentile
labels:
request: "$2"
percentile: "$3"
type: GAUGE
# Controller metrics
- pattern: kafka.controller<type=(.+), name=(.+)><>Value
name: kafka_controller_$1_$2
type: GAUGE
Critical Alerts
| Alert |
Condition |
Action |
| Offline Partitions |
OfflinePartitionsCount > 0 |
Investigate broker failures |
| No Controller |
ActiveControllerCount != 1 |
Check controller election |
| Under Min ISR |
UnderMinIsrPartitionCount > 0 |
Check broker health |
Warning Level
| Alert |
Condition |
Action |
| Under-Replicated |
UnderReplicatedPartitions > 0 for 5min |
Check replication lag |
| High Produce Latency |
P99 > 100ms |
Check disk I/O, network |
| Consumer Lag Growing |
Lag increasing continuously |
Scale consumers |
| Disk Usage High |
> 80% used |
Add storage or adjust retention |
Sample Alert Rules
# alert-rules.yml
groups:
- name: kafka-critical
rules:
- alert: KafkaOfflinePartitions
expr: kafka_controller_offline_partitions_count > 0
for: 1m
labels:
severity: critical
annotations:
summary: "Kafka has offline partitions"
- alert: KafkaNoActiveController
expr: kafka_controller_active_controller_count != 1
for: 1m
labels:
severity: critical
annotations:
summary: "Kafka cluster has no active controller"
- alert: KafkaUnderReplicatedPartitions
expr: kafka_server_replica_manager_under_replicated_partitions > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Kafka has under-replicated partitions"
- alert: KafkaConsumerLagGrowing
expr: rate(kafka_consumer_group_lag[5m]) > 0
for: 15m
labels:
severity: warning
annotations:
summary: "Consumer lag is continuously growing"
Consumer Lag Monitoring
Using kafka-consumer-groups
# Check lag for all groups
kafka-consumer-groups.sh --bootstrap-server kafka:9092 \
--describe --all-groups
# Check specific group
kafka-consumer-groups.sh --bootstrap-server kafka:9092 \
--describe --group my-consumer-group
Output Interpretation
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
my-group my-topic 0 1000 1050 50
my-group my-topic 1 2000 2000 0
my-group my-topic 2 1500 1600 100
| Column |
Description |
| CURRENT-OFFSET |
Consumer's committed offset |
| LOG-END-OFFSET |
Latest offset in partition |
| LAG |
LOG-END-OFFSET - CURRENT-OFFSET |
Dashboard Panels
Cluster Overview
| Panel |
Metrics |
| Active Controller |
kafka_controller_active_controller_count |
| Online Brokers |
Count of responding brokers |
| Offline Partitions |
kafka_controller_offline_partitions_count |
| Under-Replicated |
kafka_server_replica_manager_under_replicated_partitions |
Throughput
| Panel |
Metrics |
| Messages In/s |
kafka_server_broker_topic_metrics_messages_in_total rate |
| Bytes In/s |
kafka_server_broker_topic_metrics_bytes_in_total rate |
| Bytes Out/s |
kafka_server_broker_topic_metrics_bytes_out_total rate |
| Requests/s |
kafka_network_request_total rate |
Latency
| Panel |
Metrics |
| Produce P99 |
kafka_network_request_total_time_ms{quantile="0.99"} |
| Fetch P99 |
kafka_network_request_total_time_ms{quantile="0.99"} |
| Queue Time |
kafka_network_request_queue_time_ms |
Resources
| Panel |
Metrics |
| CPU Usage |
Host CPU metrics |
| Memory Usage |
Host memory metrics |
| Disk Usage |
kafka_log_size per partition |
| Network I/O |
Host network metrics |
Health Check Script
#!/bin/bash
# kafka-health-check.sh
BOOTSTRAP_SERVER=${1:-"localhost:9092"}
echo "=== Kafka Health Check ==="
# Check broker connectivity
echo -n "Broker connectivity: "
if kafka-broker-api-versions.sh --bootstrap-server $BOOTSTRAP_SERVER > /dev/null 2>&1; then
echo "OK"
else
echo "FAILED"
exit 1
fi
# Check offline partitions
OFFLINE=$(kafka-topics.sh --bootstrap-server $BOOTSTRAP_SERVER \
--describe --unavailable-partitions 2>/dev/null | wc -l)
echo "Offline partitions: $OFFLINE"
if [ "$OFFLINE" -gt 0 ]; then
echo "CRITICAL: Offline partitions detected"
exit 2
fi
# Check under-replicated partitions
UNDER_REP=$(kafka-topics.sh --bootstrap-server $BOOTSTRAP_SERVER \
--describe --under-replicated-partitions 2>/dev/null | wc -l)
echo "Under-replicated partitions: $UNDER_REP"
if [ "$UNDER_REP" -gt 0 ]; then
echo "WARNING: Under-replicated partitions detected"
exit 1
fi
echo "=== All checks passed ==="
exit 0
Replication Metrics
| Metric |
Description |
Alert Threshold |
kafka.server:type=ReplicaManager,name=IsrShrinksPerSec |
ISR shrink rate |
> 0 during normal operation |
kafka.server:type=ReplicaManager,name=IsrExpandsPerSec |
ISR expansion rate |
Should follow shrinks |
kafka.server:type=ReplicaManager,name=FailedIsrUpdatesPerSec |
Failed ISR update rate |
> 0 |
kafka.server:type=ReplicaManager,name=LeaderCount |
Leader replicas per broker |
Uneven distribution |
kafka.server:type=ReplicaManager,name=PartitionCount |
Partitions per broker |
Uneven distribution |
kafka.server:type=ReplicaManager,name=OfflineReplicaCount |
Offline replicas |
> 0 |
kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica |
Max follower lag |
Proportional to batch size |
Request Processing Metrics
Request Time Breakdown
| Metric |
Description |
Notes |
kafka.network:type=RequestMetrics,name=TotalTimeMs |
Total request time |
Sum of all phases |
kafka.network:type=RequestMetrics,name=RequestQueueTimeMs |
Time waiting in request queue |
High values indicate overload |
kafka.network:type=RequestMetrics,name=LocalTimeMs |
Time processing at leader |
Disk I/O bound |
kafka.network:type=RequestMetrics,name=RemoteTimeMs |
Time waiting for followers |
Non-zero with acks=all |
kafka.network:type=RequestMetrics,name=ResponseQueueTimeMs |
Time in response queue |
Network thread saturation |
kafka.network:type=RequestMetrics,name=ResponseSendTimeMs |
Time sending response |
Network bandwidth |
Request Handler Utilization
| Metric |
Description |
Alert Threshold |
kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent |
Network thread idle ratio |
< 0.3 |
kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent |
Request handler idle ratio |
< 0.3 |
kafka.network:type=RequestChannel,name=RequestQueueSize |
Pending requests |
Growing continuously |
Purgatory Metrics
Purgatory holds requests waiting for conditions to be met (e.g., acks from replicas).
| Metric |
Description |
Notes |
kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Produce |
Pending produce requests |
Non-zero with acks=-1 |
kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Fetch |
Pending fetch requests |
Depends on fetch.wait.max.ms |
Log and Storage Metrics
| Metric |
Description |
Notes |
kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs |
Log flush rate and time |
Disk performance indicator |
kafka.log:type=LogManager,name=OfflineLogDirectoryCount |
Offline log directories |
Should be 0 |
kafka.log:type=Log,name=Size,topic=X,partition=Y |
Partition size in bytes |
Per-partition storage |
kafka.log:type=Log,name=NumLogSegments,topic=X,partition=Y |
Segment count per partition |
Segment management |
kafka.log:type=Log,name=LogStartOffset,topic=X,partition=Y |
First available offset |
Retention tracking |
kafka.log:type=Log,name=LogEndOffset,topic=X,partition=Y |
Latest offset |
Progress tracking |
Controller Metrics
| Metric |
Description |
Alert Threshold |
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs |
Leader election rate |
Non-zero during failures |
kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec |
Unclean elections |
> 0 (potential data loss) |
kafka.controller:type=KafkaController,name=TopicsToDeleteCount |
Pending topic deletions |
Should decrease |
kafka.controller:type=KafkaController,name=ReplicasToDeleteCount |
Pending replica deletions |
Should decrease |
kafka.controller:type=ControllerEventManager,name=EventQueueSize |
Controller event queue |
Growing continuously |
kafka.controller:type=ControllerEventManager,name=EventQueueTimeMs |
Event wait time |
High latency |
KRaft Monitoring
KRaft clusters expose Raft consensus metrics on both controllers and brokers.
Quorum State Metrics
| Metric |
Description |
Notes |
kafka.server:type=raft-metrics,name=current-state |
Node state |
leader, follower, candidate, observer |
kafka.server:type=raft-metrics,name=current-leader |
Current leader ID |
-1 indicates unknown |
kafka.server:type=raft-metrics,name=current-epoch |
Current quorum epoch |
Increments on elections |
kafka.server:type=raft-metrics,name=high-watermark |
Committed log offset |
-1 if unknown |
kafka.server:type=raft-metrics,name=log-end-offset |
End of Raft log |
Replication progress |
| Metric |
Description |
Alert Threshold |
kafka.server:type=raft-metrics,name=commit-latency-avg |
Average commit latency |
Increasing trend |
kafka.server:type=raft-metrics,name=commit-latency-max |
Maximum commit latency |
Spikes |
kafka.server:type=raft-metrics,name=election-latency-avg |
Average election time |
Extended elections |
kafka.server:type=raft-metrics,name=fetch-records-rate |
Record fetch rate |
Replication throughput |
kafka.server:type=raft-metrics,name=append-records-rate |
Record append rate |
Write throughput |
Group Coordinator Monitoring
The group coordinator manages consumer group membership and offset storage.
Partition State Metrics
| Metric |
Description |
Notes |
kafka.server:type=group-coordinator-metrics,name=num-partitions,state=loading |
Loading partitions |
Should be transient |
kafka.server:type=group-coordinator-metrics,name=num-partitions,state=active |
Active partitions |
Normal operation |
kafka.server:type=group-coordinator-metrics,name=num-partitions,state=failed |
Failed partitions |
Should be 0 |
Consumer Group State Metrics
| Metric |
Description |
Notes |
kafka.server:type=group-coordinator-metrics,name=consumer-group-count,state=stable |
Stable groups |
Normal state |
kafka.server:type=group-coordinator-metrics,name=consumer-group-count,state=empty |
Empty groups |
No active members |
kafka.server:type=group-coordinator-metrics,name=consumer-group-count,state=assigning |
Groups assigning partitions |
Rebalance in progress |
kafka.server:type=group-coordinator-metrics,name=consumer-group-count,state=reconciling |
Groups reconciling |
Incremental rebalance |
kafka.server:type=group-coordinator-metrics,name=consumer-group-rebalance-rate |
Rebalance frequency |
High rate indicates instability |
Offset Management Metrics
| Metric |
Description |
Notes |
kafka.server:type=group-coordinator-metrics,name=offset-commit-rate |
Offset commit rate |
Consumer activity |
kafka.server:type=group-coordinator-metrics,name=offset-expiration-rate |
Offset expiration rate |
Inactive consumers |
kafka.server:type=GroupMetadataManager,name=NumOffsets |
Total committed offsets |
Storage overhead |
kafka.server:type=GroupMetadataManager,name=NumGroups |
Total consumer groups |
Group management |
Tiered Storage Monitoring
For clusters with tiered storage enabled, monitor remote storage operations.
Remote Storage Throughput
| Metric |
Description |
Notes |
kafka.server:type=BrokerTopicMetrics,name=RemoteFetchBytesPerSec |
Bytes read from remote |
Cold read volume |
kafka.server:type=BrokerTopicMetrics,name=RemoteFetchRequestsPerSec |
Remote fetch requests |
Cold read frequency |
kafka.server:type=BrokerTopicMetrics,name=RemoteCopyBytesPerSec |
Bytes copied to remote |
Upload throughput |
kafka.server:type=BrokerTopicMetrics,name=RemoteCopyRequestsPerSec |
Copy requests to remote |
Upload frequency |
kafka.server:type=BrokerTopicMetrics,name=RemoteDeleteRequestsPerSec |
Delete requests |
Retention cleanup |
Remote Storage Lag
| Metric |
Description |
Alert Threshold |
kafka.server:type=BrokerTopicMetrics,name=RemoteCopyLagBytes |
Bytes pending upload |
Growing continuously |
kafka.server:type=BrokerTopicMetrics,name=RemoteCopyLagSegments |
Segments pending upload |
> configured threshold |
kafka.server:type=BrokerTopicMetrics,name=RemoteDeleteLagBytes |
Bytes pending deletion |
Growing continuously |
kafka.server:type=BrokerTopicMetrics,name=RemoteDeleteLagSegments |
Segments pending deletion |
> configured threshold |
Remote Storage Errors
| Metric |
Description |
Alert Threshold |
kafka.server:type=BrokerTopicMetrics,name=RemoteFetchErrorsPerSec |
Remote read errors |
> 0 |
kafka.server:type=BrokerTopicMetrics,name=RemoteCopyErrorsPerSec |
Remote write errors |
> 0 |
kafka.server:type=BrokerTopicMetrics,name=RemoteDeleteErrorsPerSec |
Remote delete errors |
> 0 |
Remote Storage Thread Pool
| Metric |
Description |
Alert Threshold |
org.apache.kafka.storage.internals.log:type=RemoteStorageThreadPool,name=RemoteLogReaderTaskQueueSize |
Read task queue |
Growing continuously |
org.apache.kafka.storage.internals.log:type=RemoteStorageThreadPool,name=RemoteLogReaderAvgIdlePercent |
Read thread utilization |
< 0.3 |
kafka.log.remote:type=RemoteLogManager,name=RemoteLogManagerTasksAvgIdlePercent |
Copy thread utilization |
< 0.3 |
Producer Client Metrics
Client-side metrics for monitoring producer applications.
Throughput Metrics
| Metric |
Description |
Notes |
kafka.producer:type=producer-metrics,name=record-send-rate |
Records sent per second |
Production rate |
kafka.producer:type=producer-metrics,name=byte-rate |
Bytes sent per second |
Bandwidth usage |
kafka.producer:type=producer-metrics,name=compression-rate-avg |
Compression ratio |
< 1.0 indicates compression |
kafka.producer:type=producer-metrics,name=record-size-avg |
Average record size |
Sizing validation |
Latency Metrics
| Metric |
Description |
Alert Threshold |
kafka.producer:type=producer-metrics,name=request-latency-avg |
Average request latency |
Increasing trend |
kafka.producer:type=producer-metrics,name=request-latency-max |
Maximum request latency |
Spikes |
kafka.producer:type=producer-metrics,name=record-queue-time-avg |
Time in buffer |
High indicates backpressure |
kafka.producer:type=producer-metrics,name=produce-throttle-time-avg |
Throttle time |
> 0 indicates quota hit |
Buffer Metrics
| Metric |
Description |
Alert Threshold |
kafka.producer:type=producer-metrics,name=buffer-available-bytes |
Available buffer space |
Approaching 0 |
kafka.producer:type=producer-metrics,name=buffer-total-bytes |
Total buffer size |
Configuration reference |
kafka.producer:type=producer-metrics,name=bufferpool-wait-ratio |
Time waiting for buffer |
> 0 indicates memory pressure |
kafka.producer:type=producer-metrics,name=batch-size-avg |
Average batch size |
Tuning indicator |
Error Metrics
| Metric |
Description |
Alert Threshold |
kafka.producer:type=producer-metrics,name=record-error-rate |
Record error rate |
> 0 |
kafka.producer:type=producer-metrics,name=record-retry-rate |
Record retry rate |
High rate |
Consumer Client Metrics
Client-side metrics for monitoring consumer applications.
Throughput Metrics
| Metric |
Description |
Notes |
kafka.consumer:type=consumer-fetch-manager-metrics,name=records-consumed-rate |
Records consumed per second |
Consumption rate |
kafka.consumer:type=consumer-fetch-manager-metrics,name=bytes-consumed-rate |
Bytes consumed per second |
Bandwidth usage |
kafka.consumer:type=consumer-fetch-manager-metrics,name=fetch-rate |
Fetch request rate |
Request frequency |
kafka.consumer:type=consumer-fetch-manager-metrics,name=records-per-request-avg |
Records per fetch |
Efficiency indicator |
Lag Metrics
| Metric |
Description |
Alert Threshold |
kafka.consumer:type=consumer-fetch-manager-metrics,name=records-lag-max |
Maximum partition lag |
Growing continuously |
kafka.consumer:type=consumer-fetch-manager-metrics,name=records-lag,partition=X |
Per-partition lag |
Above threshold |
kafka.consumer:type=consumer-fetch-manager-metrics,name=records-lead-min |
Minimum lead (distance to start) |
Approaching 0 |
Rebalance Metrics
| Metric |
Description |
Alert Threshold |
kafka.consumer:type=consumer-coordinator-metrics,name=rebalance-total |
Total rebalances |
High count |
kafka.consumer:type=consumer-coordinator-metrics,name=rebalance-rate-per-hour |
Rebalance frequency |
> 1-2 per hour |
kafka.consumer:type=consumer-coordinator-metrics,name=rebalance-latency-avg |
Average rebalance time |
> configured session timeout |
kafka.consumer:type=consumer-coordinator-metrics,name=assigned-partitions |
Assigned partition count |
Uneven distribution |
Heartbeat Metrics
| Metric |
Description |
Alert Threshold |
kafka.consumer:type=consumer-coordinator-metrics,name=heartbeat-rate |
Heartbeats per second |
Below expected rate |
kafka.consumer:type=consumer-coordinator-metrics,name=heartbeat-response-time-max |
Max heartbeat response time |
Approaching session timeout |
kafka.consumer:type=consumer-coordinator-metrics,name=last-heartbeat-seconds-ago |
Time since last heartbeat |
Approaching session timeout |
Commit Metrics
| Metric |
Description |
Notes |
kafka.consumer:type=consumer-coordinator-metrics,name=commit-rate |
Commit rate |
Commit frequency |
kafka.consumer:type=consumer-coordinator-metrics,name=commit-latency-avg |
Average commit latency |
Performance indicator |
Kafka Streams Metrics
For Kafka Streams applications, monitor stream processing performance.
Thread Metrics
| Metric |
Description |
Notes |
kafka.streams:type=stream-thread-metrics,name=state |
Thread state |
RUNNING, PARTITIONS_ASSIGNED, etc. |
kafka.streams:type=stream-thread-metrics,name=commit-rate |
Commits per second |
Processing frequency |
kafka.streams:type=stream-thread-metrics,name=poll-rate |
Polls per second |
Input rate |
kafka.streams:type=stream-thread-metrics,name=process-rate |
Records processed per second |
Processing throughput |
Processing Latency
| Metric |
Description |
Alert Threshold |
kafka.streams:type=stream-thread-metrics,name=process-latency-avg |
Average processing time |
Increasing trend |
kafka.streams:type=stream-thread-metrics,name=commit-latency-avg |
Average commit time |
High latency |
kafka.streams:type=stream-thread-metrics,name=poll-latency-avg |
Average poll time |
High latency |
kafka.streams:type=stream-thread-metrics,name=punctuate-latency-avg |
Average punctuate time |
High latency |
Task Metrics
| Metric |
Description |
Notes |
kafka.streams:type=stream-thread-metrics,name=task-created-rate |
Task creation rate |
Rebalance activity |
kafka.streams:type=stream-thread-metrics,name=task-closed-rate |
Task close rate |
Rebalance activity |
kafka.streams:type=stream-task-metrics,name=process-rate |
Per-task processing rate |
Task-level throughput |
kafka.streams:type=stream-task-metrics,name=dropped-records-rate |
Dropped record rate |
Data loss indicator |
State Store Metrics
| Metric |
Description |
Notes |
kafka.streams:type=stream-state-metrics,name=put-rate |
State store write rate |
Write throughput |
kafka.streams:type=stream-state-metrics,name=get-rate |
State store read rate |
Read throughput |
kafka.streams:type=stream-state-metrics,name=flush-rate |
State store flush rate |
Persistence frequency |
kafka.streams:type=stream-state-metrics,name=restore-rate |
State restoration rate |
Recovery progress |
Quota Metrics
Monitor client quota enforcement.
| Metric |
Description |
Notes |
kafka.server:type=Produce,user=X,client-id=Y,name=throttle-time |
Producer throttle time |
> 0 indicates quota exceeded |
kafka.server:type=Fetch,user=X,client-id=Y,name=throttle-time |
Consumer throttle time |
> 0 indicates quota exceeded |
kafka.server:type=Request,user=X,client-id=Y,name=throttle-time |
Request throttle time |
> 0 indicates quota exceeded |
Security Metrics
Monitor authentication and authorization.
| Metric |
Description |
Alert Threshold |
kafka.server:type=socket-server-metrics,name=successful-authentication-rate |
Successful auth rate |
Reference baseline |
kafka.server:type=socket-server-metrics,name=failed-authentication-rate |
Failed auth rate |
> 0 |
kafka.network:type=SocketServer,name=ExpiredConnectionsKilledCount |
Connections killed (auth expiry) |
> 0 with re-auth enabled |
Kafka 4.2 Metrics Changes
Metric Naming Convention (KIP-1100)
Kafka 4.2 corrects metric names to follow the kafka.COMPONENT naming convention. Some metric names from earlier versions have been renamed. Monitor for any dashboard or alerting rule breakage after upgrading.
New Metrics in Kafka 4.2
| Metric |
KIP |
Description |
kafka.controller:AvgIdleRatio |
KIP-1190 |
Controller thread idle ratio. Low values indicate the controller is under heavy load |
kafka.server:AvgIdleRatio (MetadataLoader) |
KIP-1229 |
MetadataLoader thread idle ratio. Monitors metadata processing capacity |
kafka.server:RequestHandlerAvgIdlePercent |
KIP-1207 |
Fixed in KRaft combined mode to report accurately (previously incorrect in combined controller+broker nodes) |
| Feature level metrics |
KIP-1180 |
Generic metrics for finalized and supported feature levels across the cluster |
client-id tag on AppInfo |
KIP-1120 |
AppInfo metrics now include a client-id tag for distinguishing between client instances |
application-id tag on Streams state |
KIP-1221 |
Kafka Streams client state metric now includes an application-id tag |
| Share partition lag |
KIP-1226 |
Lag metrics for share group partition consumption progress |