Skip to content
Maintained by AxonOps — production-grade documentation from engineers who operate distributed databases at scale

Kafka on Kubernetes

Production deployment guide for Apache Kafka on Kubernetes clusters.


Architecture Overview

uml diagram


StatefulSet Deployment

Namespace and ConfigMap

# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: kafka
---
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: kafka-config
  namespace: kafka
data:
  server.properties: |
    # KRaft mode
    process.roles=broker,controller
    node.id=${KAFKA_NODE_ID}
    [email protected]:9093,[email protected]:9093,[email protected]:9093

    # Listeners
    listeners=PLAINTEXT://:9092,CONTROLLER://:9093
    advertised.listeners=PLAINTEXT://${HOSTNAME}.kafka-headless.kafka.svc.cluster.local:9092
    controller.listener.names=CONTROLLER
    inter.broker.listener.name=PLAINTEXT

    # Log configuration
    log.dirs=/var/kafka/data
    num.partitions=3
    default.replication.factor=3
    min.insync.replicas=2

    # Performance
    num.network.threads=8
    num.io.threads=16
    socket.send.buffer.bytes=102400
    socket.receive.buffer.bytes=102400
    socket.request.max.bytes=104857600

Headless Service

# service-headless.yaml
apiVersion: v1
kind: Service
metadata:
  name: kafka-headless
  namespace: kafka
  labels:
    app: kafka
spec:
  type: ClusterIP
  clusterIP: None
  publishNotReadyAddresses: true
  ports:
    - name: client
      port: 9092
      targetPort: 9092
    - name: controller
      port: 9093
      targetPort: 9093
  selector:
    app: kafka

Bootstrap Service

# service-bootstrap.yaml
apiVersion: v1
kind: Service
metadata:
  name: kafka-bootstrap
  namespace: kafka
  labels:
    app: kafka
spec:
  type: ClusterIP
  ports:
    - name: client
      port: 9092
      targetPort: 9092
  selector:
    app: kafka

StatefulSet

# statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kafka
  namespace: kafka
spec:
  serviceName: kafka-headless
  replicas: 3
  podManagementPolicy: Parallel
  selector:
    matchLabels:
      app: kafka
  template:
    metadata:
      labels:
        app: kafka
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: kafka
              topologyKey: kubernetes.io/hostname
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              preference:
                matchExpressions:
                  - key: topology.kubernetes.io/zone
                    operator: In
                    values:
                      - us-central1-a
                      - us-central1-b
                      - us-central1-c
      terminationGracePeriodSeconds: 300
      securityContext:
        fsGroup: 1000
        runAsUser: 1000
        runAsNonRoot: true
      containers:
        - name: kafka
          image: apache/kafka:3.7.0
          ports:
            - name: client
              containerPort: 9092
            - name: controller
              containerPort: 9093
          env:
            - name: KAFKA_NODE_ID
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: KAFKA_HEAP_OPTS
              value: "-Xms4g -Xmx4g"
          command:
            - bash
            - -c
            - |
              export KAFKA_NODE_ID=${HOSTNAME##*-}
              /opt/kafka/bin/kafka-storage.sh format -t $(cat /var/kafka/data/cluster-id 2>/dev/null || kafka-storage.sh random-uuid | tee /var/kafka/data/cluster-id) -c /etc/kafka/server.properties --ignore-formatted
              exec /opt/kafka/bin/kafka-server-start.sh /etc/kafka/server.properties
          resources:
            requests:
              memory: "6Gi"
              cpu: "2"
            limits:
              memory: "8Gi"
              cpu: "4"
          volumeMounts:
            - name: data
              mountPath: /var/kafka/data
            - name: config
              mountPath: /etc/kafka
          readinessProbe:
            tcpSocket:
              port: 9092
            initialDelaySeconds: 30
            periodSeconds: 10
          livenessProbe:
            tcpSocket:
              port: 9092
            initialDelaySeconds: 60
            periodSeconds: 20
      volumes:
        - name: config
          configMap:
            name: kafka-config
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: ssd
        resources:
          requests:
            storage: 100Gi

Storage Classes

AWS EBS

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "6000"
  throughput: "500"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Azure Disk

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ssd
provisioner: disk.csi.azure.com
parameters:
  skuName: PremiumV2_LRS
  diskIOPSReadWrite: "6000"
  diskMBpsReadWrite: "500"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

GCP Persistent Disk

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ssd
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Pod Disruption Budget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: kafka-pdb
  namespace: kafka
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: kafka

Resource Recommendations

Pod Resources

Component CPU Request CPU Limit Memory Request Memory Limit
Development 500m 1 2Gi 4Gi
Production 2 4 6Gi 8Gi
High Throughput 4 8 12Gi 16Gi

JVM Heap Settings

Memory Limit Heap Size Recommended
4Gi 2g Development
8Gi 4g Production
16Gi 8g High throughput

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: kafka-network-policy
  namespace: kafka
spec:
  podSelector:
    matchLabels:
      app: kafka
  policyTypes:
    - Ingress
    - Egress
  ingress:
    # Inter-broker communication
    - from:
        - podSelector:
            matchLabels:
              app: kafka
      ports:
        - port: 9092
        - port: 9093
    # Client access from application namespaces
    - from:
        - namespaceSelector:
            matchLabels:
              kafka-client: "true"
      ports:
        - port: 9092
  egress:
    # Inter-broker communication
    - to:
        - podSelector:
            matchLabels:
              app: kafka
      ports:
        - port: 9092
        - port: 9093
    # DNS resolution
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP

External Access

NodePort Service

apiVersion: v1
kind: Service
metadata:
  name: kafka-external
  namespace: kafka
spec:
  type: NodePort
  ports:
    - name: client
      port: 9092
      nodePort: 30092
  selector:
    app: kafka

LoadBalancer (Cloud)

apiVersion: v1
kind: Service
metadata:
  name: kafka-lb
  namespace: kafka
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
  type: LoadBalancer
  ports:
    - name: client
      port: 9092
      targetPort: 9092
  selector:
    app: kafka

Ingress (with SNI)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kafka-ingress
  namespace: kafka
  annotations:
    nginx.ingress.kubernetes.io/ssl-passthrough: "true"
spec:
  ingressClassName: nginx
  rules:
    - host: kafka.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: kafka-bootstrap
                port:
                  number: 9092

Helm Chart

values.yaml

# values.yaml
replicaCount: 3

image:
  repository: apache/kafka
  tag: "3.7.0"
  pullPolicy: IfNotPresent

resources:
  requests:
    memory: "6Gi"
    cpu: "2"
  limits:
    memory: "8Gi"
    cpu: "4"

jvmOptions:
  heapSize: "4g"

persistence:
  enabled: true
  storageClass: ssd
  size: 100Gi

config:
  numPartitions: 3
  defaultReplicationFactor: 3
  minInsyncReplicas: 2

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: kafka
        topologyKey: kubernetes.io/hostname

podDisruptionBudget:
  minAvailable: 2

metrics:
  enabled: true
  port: 9404

Monitoring

ServiceMonitor (Prometheus)

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kafka
  namespace: kafka
spec:
  selector:
    matchLabels:
      app: kafka
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

JMX Exporter Sidecar

# Add to StatefulSet containers
- name: jmx-exporter
  image: bitnami/jmx-exporter:latest
  ports:
    - name: metrics
      containerPort: 9404
  args:
    - "9404"
    - "/etc/jmx-exporter/config.yaml"
  volumeMounts:
    - name: jmx-config
      mountPath: /etc/jmx-exporter

Rolling Updates

Update Strategy

spec:
  updateStrategy:
    type: RollingUpdate
  podManagementPolicy: OrderedReady

Performing Updates

# Update image
kubectl set image statefulset/kafka kafka=apache/kafka:3.8.0 -n kafka

# Monitor rollout
kubectl rollout status statefulset/kafka -n kafka

# Rollback if needed
kubectl rollout undo statefulset/kafka -n kafka

Rolling Update Considerations

Rolling updates should be performed during low-traffic periods. Each broker restart causes partition leadership changes and potential consumer rebalancing.