Kafka Cost Comparison 2026
Teams evaluating Kafka usually end up comparing three operating models. The first is to run open Apache Kafka® yourself. The second is to use Amazon MSK. The third is to move further up the managed spectrum with Confluent Cloud.
On paper, managed Kafka looks like the easier decision. In practice, the economics are more nuanced. Managed services reduce part of the infrastructure work, but they do not remove the day-2 work that platform teams still own: topic design, ACL governance, consumer lag, connector failures, schema evolution, incident response, and observability. That is why the useful comparison is not only “which line item is cheaper?” but also “what do you actually get for the money?”
This article uses public pricing published by AWS and Confluent as of March 18, 2026. It is written to help engineering teams reason about the trade-offs, not to pretend there is one number that fits every Kafka estate.
What This Comparison Assumes
The cost section is more useful if it follows realistic Kafka estate sizes rather than one isolated cluster. This comparison therefore uses three cluster shapes:
- 3 brokers: a small production cluster
- 9 brokers: a regional shared platform
- 30 brokers: a larger central streaming platform using higher-capacity brokers
To make those shapes concrete, each one is paired with an approximate sustained throughput band. Those throughput bands are not vendor promises. They are planning figures derived from AWS’s published sustained throughput guidance for MSK Express brokers, then used as a like-for-like reference point for open Kafka on comparable broker sizing:
- 3 brokers: about 45 MBps ingress / 90 MBps egress
- 9 brokers: about 140 MBps ingress / 280 MBps egress
- 30 brokers on
express.m7g.2xlarge: about 1.875 GBps ingress / 3.75 GBps egress
The first two rows use express.m7g.large planning guidance. The 30-broker row uses express.m7g.2xlarge, because that is a more realistic class if you are actually building a larger shared Kafka platform rather than a fleet of many small brokers.
Storage is scaled at roughly the same density as the AWS 3-broker pricing example:
- 3 brokers: about 1.5 TB average retained storage
- 9 brokers: about 4.5 TB average retained storage
- 30 brokers: about 15.2 TB average retained storage
For the self-hosted case, the reference model is Kafka on Kubernetes with Strimzi, plus AxonOps as the operational control plane for monitoring, alerting, topics, ACLs, Schema Registry, Connect, and secure message viewing. The raw infrastructure estimate uses AWS’s published EKS worker-node pricing examples plus gp3 storage pricing. If you already have Kubernetes capacity, existing VMs, or bare metal, the self-hosted number drops further. The broader operating model behind that stack is covered in more detail in Running Kafka at Scale Without a Platform Team.
The Cost Shape Is Different In Each Model
Self-hosted Kafka is mostly an infrastructure bill. Amazon MSK adds managed Kafka pricing on top of broker and storage capacity. Confluent Cloud uses elastic units, storage, and traffic-based pricing. Those models feel similar at a glance, but they diverge quickly as estates get larger.
The other important difference is that managed Kafka pricing does not remove the need for a control plane. Even if AWS or Confluent operate the broker layer, your engineers still need a way to see cluster health, track consumer lag, manage topics and ACLs, and understand what changed during an incident.
Scenario Comparison: 3, 9, and 30 Brokers
The table below compares the steady-state Kafka platform bill before optional extras such as private networking, managed connectors, cross-region replication, or premium support. For MSK Express and Confluent Cloud in particular, real traffic charges can materially increase the bill beyond these baseline figures.
| Scenario |
Self-Hosted Kafka
Strimzi + AxonOps model |
Amazon MSK Express
|
Confluent Cloud
|
|---|---|---|---|
| 3 brokers ~45 MBps in / 90 MBps out |
~$571/month | ~$1,045/month before ingest charges | ~$1,216/month on Standard before ingress/egress charges |
| 9 brokers ~140 MBps in / 280 MBps out |
~$1,567/month | ~$3,135/month before ingest charges | ~$3,649/month on Standard before ingress/egress charges |
| 30 brokers on `m7g.2xlarge` class ~1.875 GBps in / ~3.75 GBps out |
~$8,820/month on comparable 8 vCPU workers | ~$37,257/month before ingest charges | ~$42,093 to ~$53,773/month on Enterprise before ingress/egress charges |
| Scaling trigger | Add infrastructure as needed | More brokers, more storage, more ingest fees | More eCKUs, more storage, more ingress/egress fees |
| Pricing shape | Linear and predictable | Linear until traffic fees accelerate it | Can steepen quickly as capacity and traffic rise |
| Retention cost behavior | Direct disk cost, with freedom to choose the storage architecture | Direct `GB-month` charge at the MSK storage rate | Direct `GB-month` charge added on top of the eCKU platform fee |
These figures are intentionally about the Kafka platform bill rather than a full organizational TCO model. They do not include application engineering effort, support contracts, or optional third-party services around the cluster. That is useful here because it shows the economic shape before the surrounding platform decisions are added on top.
Retention Changes The Bill Faster Than Many Teams Expect
Retention is worth separating from broker count because the two are related but not identical. A Kafka estate can have a modest broker footprint and still carry a large storage bill if topics need long rewind windows, high replication, or compaction with large working sets. The reverse is also true: a busy cluster with short-lived topics may need substantial broker throughput without retaining very much data at all.
That is why storage should be treated as its own planning axis. The variables are straightforward:
- average retained bytes per topic
- replication factor
- cleanup policy, especially compacted topics
- retention window in hours or days
- whether the platform needs a long consumer replay window for recovery or audit
In the cost model here, storage is not hidden inside the compute estimate. It is priced separately, which makes the retention effect easier to see.
| Additional retained cluster storage |
Self-Hosted Kafka
|
Amazon MSK Express
|
Confluent Cloud
|
|---|---|---|---|
| Extra 10 TB retained | ~$800/month on gp3 | ~$1,000/month | ~$800/month |
| Extra 50 TB retained | ~$4,000/month on gp3 | ~$5,000/month | ~$4,000/month |
| What changes operationally | You can rebalance disk architecture, denser nodes, or instance-store options | Storage bill rises, while the managed-service premium remains | Storage bill rises, while the eCKU platform bill remains |
The interesting point is not only that MSK storage is priced higher than the gp3 baseline used in the self-hosted estimate. It is also that Confluent’s published storage rate is only numerically the same as the gp3 baseline used here. That does not make the economics equivalent. In a self-hosted estate, storage is still an engineering choice. You can change the node shape, disk layout, storage class, retention architecture, or even move to denser instance-store patterns if they make sense for the workload. In Confluent Cloud, the storage line sits on top of the eCKU platform bill and follows Confluent’s service model rather than your own infrastructure design choices.
Some teams will prefer gp3 because it is simple. Others will use denser EBS layouts, existing Kubernetes storage classes, or instance-store NVMe where that makes sense. With managed services, the storage rate is part of the service contract. With self-hosted Kafka, storage remains an engineering choice.
A Note On Confluent’s “Cheaper Than Kafka” Claim
Confluent’s pricing estimator currently presents claims such as being materially cheaper than self-managed Apache Kafka at high throughput. Those claims should be read carefully.
The screenshot used while researching this post shows a 1 GBps write-throughput scenario, 7 days of retention, a throughput price of $0.022/GB, and a headline saying “Total Savings vs Apache Kafka 59%”. It also explicitly says that this estimate includes a 56% discount off list.
That detail changes how the claim should be read. A claim built on a large negotiated discount is not a neutral market baseline. It is a sales scenario. It may be perfectly real for a particular enterprise deal, but it is not the same thing as public pricing, and it is not the same thing as a generally true statement that Confluent Cloud is cheaper than self-hosted Kafka.
There is another issue as well. Confluent controls both sides of that estimator:
- the Confluent discount assumption
- the self-managed Kafka architecture being compared against
- the operational labor and overprovisioning assumptions inside the comparison
That means the result is directionally interesting, but not authoritative. If the self-managed baseline assumes overly expensive compute, conservative overprovisioning, or a heavier operations burden than your team actually has, the saving will look larger than it really is. If your estate already runs Kubernetes, has spare capacity, uses denser storage layouts, or already has a platform team operating Kafka alongside other systems, the self-hosted economics usually look materially better than the marketing claim suggests.
The fair way to read Confluent’s estimator is not as proof that Confluent Cloud is broadly cheaper than Kafka. It is better read as proof that Confluent is willing to discount heavily in large competitive deals. That is useful commercial information, but it is not the same thing as a durable architectural conclusion.
This gets more significant as estates get older. A platform that starts with seven days of retention often grows into thirty, sixty, or ninety days once replay, forensics, compliance, or downstream unreliability become real operational concerns. At that point, the storage line item stops being background noise. It becomes one of the main reasons teams reassess whether they still want to pay a managed-service premium on top of the retained data itself.
How the self-hosted figures were calculated
The self-hosted baseline uses one general-purpose worker per broker, plus gp3 storage and a single EKS control plane. For the 3-broker and 9-broker rows, that worker is m5a.xlarge, using AWS’s published EKS pricing example. For the 30-broker row, the worker is modeled as a comparable 8 vCPU / 32 GiB node, which is the same class as m7g.2xlarge:
- 3 brokers:
3 × $0.172/hour × 730+1,516.13 GB × $0.08+$73 EKS= ~$570.97/month - 9 brokers:
9 × $0.172/hour × 730+4,548.39 GB × $0.08+$73 EKS= ~$1,566.91/month - 30 brokers on comparable 8 vCPU workers:
30 × $0.344/hour × 730+15,161.30 GB × $0.08+$73 EKS= ~$8,819.50/month
If Kafka is going onto Kubernetes capacity you already operate, or onto existing VMs, the EKS control-plane fee disappears from each row.
How the MSK figures were calculated
For Amazon MSK Express, AWS publishes express.m7g.large broker pricing of $0.408/hour and storage at $0.10/GB-month. The 3-broker and 9-broker rows use that published rate directly. For the 30-broker express.m7g.2xlarge row, the broker-hour estimate is scaled by the family size multiplier from large to 2xlarge, which is a 4× step in the same instance family:
- 3 brokers:
3 × $0.408/hour × 730+1,516.13 GB × $0.10= ~$1,045.13/month - 9 brokers:
9 × $0.408/hour × 730+4,548.39 GB × $0.10= ~$3,135.40/month - 30 brokers on
express.m7g.2xlarge:30 × $1.632/hour × 730+15,161.30 GB × $0.10= ~$37,256.93/month
Those figures do not include MSK Express ingest charges, which AWS prices separately at $0.01/GB. That is important here because MSK’s own throughput guidance is one of the inputs behind the scenario sizing.
How the Confluent figures were calculated
Confluent Cloud requires a different translation because there are no brokers to count. Confluent documents throughput per eCKU and cluster ceilings by tier:
- Standard:
25 MBps ingress / 75 MBps egress per eCKU, up to10 eCKU - Enterprise:
60 MBps ingress / 180 MBps egress per eCKU, up to32 eCKU
That means:
- the 3-broker scenario maps cleanly to 2 Standard eCKUs
- the 9-broker scenario maps to 6 Standard eCKUs
- the 30-broker
m7g.2xlargescenario does not fit Standard’s250 MBpsingress ceiling and pushes right up against Enterprise’s maximum1,920 MBpsingress ceiling, so it has to move to 32 Enterprise eCKUs
Using published eCKU-hour and storage prices, but excluding variable ingress and egress charges, gives:
- 3-broker equivalent:
2 × $0.75 × 730+1,516.13 GB × $0.08= ~$1,216.29/month - 9-broker equivalent:
6 × $0.75 × 730+4,548.39 GB × $0.08= ~$3,648.87/month - 30-broker equivalent:
32 × $1.75 × 730+15,161.30 GB × $0.08= ~$42,092.90/month at the lower published Enterprise eCKU rate, or32 × $2.25 × 730+15,161.30 GB × $0.08= ~$53,772.90/month at the upper published rate
These rows also exclude Confluent’s variable ingress and egress charges, which are published separately from the eCKU-hour price.
Managed Kafka Does Not Remove Day-2 Operations
The price gap only tells part of the story. What teams often discover after the first few months is that managed Kafka removes some broker lifecycle work, but it does not remove Kafka operations.
With Amazon MSK, AWS runs the service, but your team still owns topic governance, client behavior, consumer lag, quotas, ACLs, schema lifecycle, and the mechanics of incident response. AWS also documents that MSK monitoring can be pushed into Prometheus-compatible tooling through open monitoring, and notes that cross-Availability-Zone data transfer charges apply when you use that path. MSK gives you a managed Kafka service, not a complete Kafka control plane.
Confluent Cloud moves further into platform territory, but the trade-off is a denser pricing and networking model. Public pricing is easy to read at the top level, yet real production use still depends on cluster tier, traffic direction, storage growth, and networking choices. Confluent’s own documentation also makes clear that public versus private networking is a deployment choice, and that you cannot switch a cluster between the two after it has been provisioned.
Self-hosting deserves to be viewed differently in 2026 than it was a few years ago. The old argument against it was operational burden. That argument weakens considerably once Kafka is being reconciled by Strimzi and operated through a proper control plane.
Operational Comparison
| Operational question |
Self-Hosted Kafka + Strimzi + AxonOps
|
Amazon MSK
|
Confluent Cloud
|
|---|---|---|---|
| Infrastructure portability | Any cloud, on-premises, or hybrid | AWS only | Confluent-managed service on supported clouds |
| Broker configuration control | Full control | Supported subset through MSK configurations | Service-defined operating model |
| Topics, ACLs, Schema Registry, Connect | One operational surface | Still requires surrounding tooling | Inside Confluent's platform model |
| Monitoring and alerting | 5-second metrics, logs, service checks, consumer lag, routing | CloudWatch and Prometheus-style options, but no unified control plane | Metrics API and platform views, but still tied to Confluent Cloud |
| Vendor lock-in profile | Low | Medium | High |
| What you are really buying | Open Kafka plus an operating model you own | Managed Kafka broker layer on AWS | Managed Kafka plus deeper attachment to Confluent's commercial platform |
That is why the self-hosted option has become more compelling. You are no longer choosing between “fully managed” and “hand-built shell scripts.” A modern self-hosted Kafka estate can use Strimzi for Kubernetes-native lifecycle management and AxonOps for the operational plane engineers actually use each day.
AxonOps already provides the Kafka surface most teams end up needing anyway: high-resolution monitoring and alerting, topic and ACL management, Schema Registry management, Kafka Connect operations, and a broader Kafka control plane that stays with you across clouds and environments.
Where Self-Hosting Starts To Win Clearly
The interesting thing about the numbers above is not that self-hosting is dramatically cheaper at tiny scale. It is that the cost curve remains understandable as the environment grows.
With self-hosted Kafka, more traffic usually means more infrastructure. That is not free, but it is legible. You can see the broker count, the storage footprint, the network shape, and the operational tooling you chose. You can also decide where to run it. If your organization already has Kubernetes, spare VM capacity, or a preference for on-premises infrastructure, the economics usually get even better.
MSK is often a reasonable middle ground for AWS-first teams that want open Kafka semantics without running the broker layer themselves. The problem is that MSK does not remove the need for platform tooling around Kafka. The premium you pay buys broker management, not unified day-2 operations.
Confluent Cloud is strongest when the organization wants a broader commercial streaming platform and is comfortable paying for that operating model. If what you need is just Kafka, plus strong operational visibility and governance, the price can climb faster than expected because the billing model follows usage dimensions rather than simple infrastructure units.
The Practical Self-Hosted Stack In 2026
For many teams, the most pragmatic answer is not “run Kafka by hand.” It is:
- Strimzi to declaratively run Kafka on Kubernetes
- AxonOps to provide the control plane
- open Apache Kafka as the runtime you keep portable
That combination changes the decision materially. Strimzi handles the reconciliation layer. AxonOps handles the operational layer. Your team keeps control over versioning, networking, and deployment architecture, while also getting the observability and governance surface that would otherwise push you back toward a managed service. If you want the practical workflow behind that separation, Running Kafka at Scale Without a Platform Team walks through it directly.
This is also where the cost story becomes more favorable to self-hosting. The managed-service premium is the budget line you can redirect into proper tooling and still keep an open operating model. In other words, the comparison should not be “MSK versus raw Kafka binaries.” It should be “MSK or Confluent versus self-hosted Kafka with the control plane you actually want.”
Conclusions
Amazon MSK and Confluent Cloud both solve real problems. They reduce parts of the broker-management burden and can be sensible choices for teams that want to offload infrastructure responsibility quickly.
Even so, the case for self-hosting Kafka is stronger than it used to be. The infrastructure cost is often lower, the cost shape is more predictable, the lock-in risk is lower, and the operating model is now far more mature than it was when “managed Kafka” first became the default answer.
For teams that want open Apache Kafka, infrastructure portability, and a serious operational surface, self-hosted Kafka with Strimzi and AxonOps is now the strongest all-round option. It gives you the control of self-hosting without going back to the bad old days of fragmented dashboards, ad-hoc topic scripts, and blind incident response.
If you want to talk through your current Kafka operating model, contact us or book time with an AxonOps expert.