General

Kafka Cost Comparison 2026: Self-Hosted vs Amazon MSK vs Confluent Cloud

AxonOps Team · March 18, 2026 · 8 min read

Kafka Cost Comparison 2026

Teams evaluating Kafka usually end up comparing three operating models. The first is to run open Apache Kafka® yourself. The second is to use Amazon MSK. The third is to move further up the managed spectrum with Confluent Cloud.

On paper, managed Kafka looks like the easier decision. In practice, the economics are more nuanced. Managed services reduce part of the infrastructure work, but they do not remove the day-2 work that platform teams still own: topic design, ACL governance, consumer lag, connector failures, schema evolution, incident response, and observability. That is why the useful comparison is not only “which line item is cheaper?” but also “what do you actually get for the money?”

This article uses public pricing published by AWS and Confluent as of March 18, 2026. It is written to help engineering teams reason about the trade-offs, not to pretend there is one number that fits every Kafka estate.

What This Comparison Assumes

The cost section is more useful if it follows realistic Kafka estate sizes rather than one isolated cluster. This comparison therefore uses three cluster shapes:

3 brokers: a small production cluster
9 brokers: a regional shared platform
30 brokers: a larger central streaming platform using higher-capacity brokers

To make those shapes concrete, each one is paired with an approximate sustained throughput band. Those throughput bands are not vendor promises. They are planning figures derived from AWS’s published sustained throughput guidance for MSK Express brokers, then used as a like-for-like reference point for open Kafka on comparable broker sizing:

3 brokers: about 45 MBps ingress / 90 MBps egress
9 brokers: about 140 MBps ingress / 280 MBps egress
30 larger brokers: about 1.875 GBps ingress / 3.75 GBps egress

The first two rows use express.m7g.large planning guidance. The 30-broker row uses express.m7g.2xlarge, because that is a more realistic class if you are actually building a larger shared Kafka platform rather than a fleet of many small brokers.

Storage is scaled at roughly the same density as the AWS 3-broker pricing example:

3 brokers: about 1.5 TB average retained storage
9 brokers: about 4.5 TB average retained storage
30 brokers: about 15.2 TB average retained storage

For the self-hosted case, the baseline model is Kafka on Kubernetes with Strimzi using AWS’s published EKS worker-node pricing examples plus gp3 storage pricing. If you already have Kubernetes capacity, existing VMs, or bare metal, the self-hosted number drops further. In practice, most teams will also want a commercial control plane for monitoring, alerting, topics, ACLs, Schema Registry, Connect, and secure message viewing. The broader operating model behind that stack is covered in more detail in Running Kafka at Scale Without a Platform Team.

The Cost Shape Is Different In Each Model

Self-hosted Kafka is mostly an infrastructure bill. Amazon MSK adds managed Kafka pricing on top of broker and storage capacity. Confluent Cloud uses elastic units, storage, and traffic-based pricing. Those models feel similar at a glance, but they diverge quickly as estates get larger.

The other important difference is that managed Kafka pricing does not remove the need for a control plane. Even if AWS or Confluent operate the broker layer, your engineers still need a way to see cluster health, track consumer lag, manage topics and ACLs, and understand what changed during an incident.

Scenario Comparison: 3, 9, and 30 Brokers

The table below compares the steady-state Kafka platform bill before optional extras such as private networking, managed connectors, cross-region replication, or premium support. For MSK Express and Confluent Cloud in particular, real traffic charges can materially increase the bill beyond these baseline figures.

Scenario	Self-Hosted Kafka Infrastructure baseline	Amazon MSK Express	Confluent Cloud
3 brokers ~45 MBps in / 90 MBps out	~$571/month	~$1,045/month before ingest charges	~$1,216/month on Standard before ingress/egress charges
9 brokers ~140 MBps in / 280 MBps out	~$1,567/month	~$3,135/month before ingest charges	~$3,649/month on Standard before ingress/egress charges
30 larger brokers ~1.875 GBps in / 3.75 GBps out	~$8,820/month on comparable 8 vCPU workers	~$37,257/month before ingest charges	~$42,093 to ~$53,773/month on Enterprise before ingress/egress charges
Scaling trigger	Add infrastructure as needed	More brokers, more storage, more ingest fees	More eCKUs, more storage, more ingress/egress fees
Pricing shape	Linear and predictable	Linear until traffic fees accelerate it	Can steepen quickly as capacity and traffic rise
Retention cost behavior	Direct disk cost, with freedom to choose the storage architecture	Direct `GB-month` charge at the MSK storage rate	Direct `GB-month` charge added on top of the eCKU platform fee

These figures are intentionally about the Kafka infrastructure baseline rather than a full organizational TCO model. They do not include application engineering effort, support contracts, or optional third-party services around the cluster. That is useful here because it shows the economic shape before the surrounding platform decisions are added on top.

They also do not include the cost of a commercial Kafka control plane. Exact pricing varies by cluster size, deployment model, and support scope. If you model an additional $100 per broker per month for that layer as a planning assumption, the self-hosted rows above become roughly $871/month for 3 brokers, $2,467/month for 9 brokers, and $11,820/month for 30 brokers. Even on that assumption, the self-hosted model remains materially below the MSK and Confluent Cloud scenarios shown here.

One AWS-specific nuance is worth calling out. In a typical self-hosted Kafka deployment spread across multiple Availability Zones, replica traffic between brokers can generate standard regional data transfer charges. Amazon MSK explicitly does not charge for data transfer used for replication between brokers or between metadata nodes and brokers in the same Region. For AWS-first teams, that narrows part of the raw infrastructure gap between self-hosted Kafka and MSK.

Retention Changes The Bill Faster Than Many Teams Expect

Retention is worth separating from broker count because the two are related but not identical. A Kafka estate can have a modest broker footprint and still carry a large storage bill if topics need long rewind windows, high replication, or compaction with large working sets. The reverse is also true: a busy cluster with short-lived topics may need substantial broker throughput without retaining very much data at all.

That is why storage should be treated as its own planning axis. The variables are straightforward:

average retained bytes per topic
replication factor
cleanup policy, especially compacted topics
retention window in hours or days
whether the platform needs a long consumer replay window for recovery or audit

In the cost model here, storage is not hidden inside the compute estimate. It is priced separately, which makes the retention effect easier to see.

Additional retained cluster storage	Self-Hosted Kafka	Amazon MSK Express	Confluent Cloud
Extra 10 TB retained	~$800/month on gp3	~$1,000/month	~$800/month
Extra 50 TB retained	~$4,000/month on gp3	~$5,000/month	~$4,000/month
What changes operationally	You can rebalance disk architecture, denser nodes, or instance-store options	Storage bill rises, while the managed-service premium remains	Storage bill rises, while the eCKU platform bill remains

The interesting point is not only that MSK storage is priced higher than the gp3 baseline used in the self-hosted estimate. It is also that Confluent’s published storage rate is only numerically the same as the gp3 baseline used here. That does not make the economics equivalent. In a self-hosted estate, storage is still an engineering choice. You can change the node shape, disk layout, storage class, retention architecture, or even move to denser instance-store patterns if they make sense for the workload. In Confluent Cloud, the storage line sits on top of the eCKU platform bill and follows Confluent’s service model rather than your own infrastructure design choices.

Some teams will prefer gp3 because it is simple. Others will use denser EBS layouts, existing Kubernetes storage classes, or instance-store NVMe where that makes sense. With managed services, the storage rate is part of the service contract. With self-hosted Kafka, storage remains an engineering choice.

A Note On Confluent’s “Cheaper Than Kafka” Claim

Confluent’s pricing estimator currently presents claims such as being materially cheaper than self-managed Apache Kafka at high throughput. Those claims should be read carefully.

The screenshot used while researching this post shows a 1 GBps write-throughput scenario, 7 days of retention, a throughput price of $0.022/GB, and a headline saying “Total Savings vs Apache Kafka 59%”. It also explicitly says that this estimate includes a 56% discount off list.

That detail changes how the claim should be read. A claim built on a large negotiated discount is not a neutral market baseline. It is a sales scenario. It may be perfectly real for a particular enterprise deal, but it is not the same thing as public pricing, and it is not the same thing as a generally true statement that Confluent Cloud is cheaper than self-hosted Kafka.

There is another issue as well. Confluent controls both sides of that estimator:

the Confluent discount assumption
the self-managed Kafka architecture being compared against
the operational labor and overprovisioning assumptions inside the comparison

That means the result is directionally interesting, but not authoritative. If the self-managed baseline assumes overly expensive compute, conservative overprovisioning, or a heavier operations burden than your team actually has, the saving will look larger than it really is. If your estate already runs Kubernetes, has spare capacity, uses denser storage layouts, or already has a platform team operating Kafka alongside other systems, the self-hosted economics usually look materially better than the marketing claim suggests.

The fair way to read Confluent’s estimator is not as proof that Confluent Cloud is broadly cheaper than Kafka. It is better read as proof that Confluent is willing to discount heavily in large competitive deals. That is useful commercial information, but it is not the same thing as a durable architectural conclusion.

This gets more significant as estates get older. A platform that starts with seven days of retention often grows into thirty, sixty, or ninety days once replay, forensics, compliance, or downstream unreliability become real operational concerns. At that point, the storage line item stops being background noise. It becomes one of the main reasons teams reassess whether they still want to pay a managed-service premium on top of the retained data itself.

How the self-hosted figures were calculated

The self-hosted baseline uses one general-purpose worker per broker, plus gp3 storage and a single EKS control plane. For the 3-broker and 9-broker rows, that worker is m5a.xlarge, using AWS’s published EKS pricing example. For the 30-broker row, the worker is modeled as a comparable 8 vCPU / 32 GiB node, which is the same class as m7g.2xlarge:

3 brokers: 3 × $0.172/hour × 730 + 1,516.13 GB × $0.08 + $73 EKS = ~$570.97/month
9 brokers: 9 × $0.172/hour × 730 + 4,548.39 GB × $0.08 + $73 EKS = ~$1,566.91/month
30 brokers on comparable 8 vCPU workers: 30 × $0.344/hour × 730 + 15,161.30 GB × $0.08 + $73 EKS = ~$8,819.50/month

If Kafka is going onto Kubernetes capacity you already operate, or onto existing VMs, the EKS control-plane fee disappears from each row.

These self-hosted figures also exclude inter-AZ data transfer. That keeps the model readable, but it slightly understates a multi-AZ Kafka deployment on AWS because replica traffic usually crosses Availability Zone boundaries when leaders and followers are distributed for resilience.

How the MSK figures were calculated

For Amazon MSK Express, AWS publishes express.m7g.large broker pricing of $0.408/hour and storage at $0.10/GB-month. The 3-broker and 9-broker rows use that published rate directly. For the 30-broker express.m7g.2xlarge row, the broker-hour estimate is scaled by the family size multiplier from large to 2xlarge, which is a 4× step in the same instance family:

3 brokers: 3 × $0.408/hour × 730 + 1,516.13 GB × $0.10 = ~$1,045.13/month
9 brokers: 9 × $0.408/hour × 730 + 4,548.39 GB × $0.10 = ~$3,135.40/month
30 brokers on express.m7g.2xlarge: 30 × $1.632/hour × 730 + 15,161.30 GB × $0.10 = ~$37,256.93/month

Those figures do not include MSK Express ingest charges, which AWS prices separately at $0.01/GB. That is important here because MSK’s own throughput guidance is one of the inputs behind the scenario sizing.

MSK also has one pricing advantage over a like-for-like self-hosted AWS deployment: AWS states that it does not charge for data transfer used for replication between brokers or between metadata nodes and brokers in the same Region. Self-hosted Kafka on EC2 or EKS does not get that exemption.

How the Confluent figures were calculated

Confluent Cloud requires a different translation because there are no brokers to count. Confluent documents throughput per eCKU and cluster ceilings by tier:

Standard: 25 MBps ingress / 75 MBps egress per eCKU, up to 10 eCKU
Enterprise: 60 MBps ingress / 180 MBps egress per eCKU, up to 32 eCKU

That means:

the 3-broker scenario maps cleanly to 2 Standard eCKUs
the 9-broker scenario maps to 6 Standard eCKUs
the 30-broker m7g.2xlarge scenario does not fit Standard’s 250 MBps ingress ceiling and pushes right up against Enterprise’s maximum 1,920 MBps ingress ceiling, so it has to move to 32 Enterprise eCKUs

Using published eCKU-hour and storage prices, but excluding variable ingress and egress charges, gives:

3-broker equivalent: 2 × $0.75 × 730 + 1,516.13 GB × $0.08 = ~$1,216.29/month
9-broker equivalent: 6 × $0.75 × 730 + 4,548.39 GB × $0.08 = ~$3,648.87/month
30-broker equivalent: 32 × $1.75 × 730 + 15,161.30 GB × $0.08 = ~$42,092.90/month at the lower published Enterprise eCKU rate, or 32 × $2.25 × 730 + 15,161.30 GB × $0.08 = ~$53,772.90/month at the upper published rate

These rows also exclude Confluent’s variable ingress and egress charges, which are published separately from the eCKU-hour price.

Managed Kafka Does Not Remove Day-2 Operations

The price gap only tells part of the story. What teams often discover after the first few months is that managed Kafka removes some broker lifecycle work, but it does not remove Kafka operations.

With Amazon MSK, AWS runs the service, but your team still owns topic governance, client behavior, consumer lag, quotas, ACLs, schema lifecycle, and the mechanics of incident response. AWS also documents that MSK monitoring can be pushed into Prometheus-compatible tooling through open monitoring, and notes that cross-Availability-Zone data transfer charges apply when you use that path. MSK gives you a managed Kafka service, not a complete Kafka control plane.

Confluent Cloud moves further into platform territory, but the trade-off is a denser pricing and networking model. Public pricing is easy to read at the top level, yet real production use still depends on cluster tier, traffic direction, storage growth, and networking choices. Confluent’s own documentation also makes clear that public versus private networking is a deployment choice, and that you cannot switch a cluster between the two after it has been provisioned.

Self-hosting deserves to be viewed differently in 2026 than it was a few years ago. The old argument against it was operational burden. That argument weakens considerably once Kafka is being reconciled by Strimzi and operated through a proper control plane.

Operational Comparison

Operational question	Self-Hosted Kafka + Strimzi + AxonOps	Amazon MSK	Confluent Cloud
Infrastructure portability	Any cloud, on-premises, or hybrid	AWS only	Confluent-managed service on supported clouds
Broker configuration control	Full control	Supported subset through MSK configurations	Service-defined operating model
Topics, ACLs, Schema Registry, Connect	One operational surface	Still requires surrounding tooling	Inside Confluent's platform model
Monitoring and alerting	5-second metrics, logs, service checks, consumer lag, routing	CloudWatch and Prometheus-style options, but no unified control plane	Metrics API and platform views, but still tied to Confluent Cloud
Vendor lock-in profile	Low	Medium	High
What you are really buying	Open Kafka plus an operating model you own	Managed Kafka broker layer on AWS	Managed Kafka plus deeper attachment to Confluent's commercial platform

That is why the self-hosted option has become more compelling. You are no longer choosing between “fully managed” and “hand-built shell scripts.” A modern self-hosted Kafka estate can use Strimzi for Kubernetes-native lifecycle management and AxonOps for the operational plane engineers actually use each day.

AxonOps already provides the Kafka surface most teams end up needing anyway: high-resolution monitoring and alerting, topic and ACL management, Schema Registry management, Kafka Connect operations, and a broader Kafka control plane that stays with you across clouds and environments.

Where Self-Hosting Starts To Win Clearly

The interesting thing about the numbers above is not that self-hosting is dramatically cheaper at tiny scale. It is that the cost curve remains understandable as the environment grows.

With self-hosted Kafka, more traffic usually means more infrastructure. That is not free, but it is legible. You can see the broker count, the storage footprint, the network shape, and the operational tooling you chose. You can also decide where to run it. If your organization already has Kubernetes, spare VM capacity, or a preference for on-premises infrastructure, the economics usually get even better.

MSK is often a reasonable middle ground for AWS-first teams that want open Kafka semantics without running the broker layer themselves. The problem is that MSK does not remove the need for platform tooling around Kafka. The premium you pay buys broker management, not unified day-2 operations.

Confluent Cloud is strongest when the organization wants a broader commercial streaming platform and is comfortable paying for that operating model. If what you need is just Kafka, plus strong operational visibility and governance, the price can climb faster than expected because the billing model follows usage dimensions rather than simple infrastructure units.

The Practical Self-Hosted Stack In 2026

For many teams, the most pragmatic answer is not “run Kafka by hand.” It is:

Strimzi to declaratively run Kafka on Kubernetes
AxonOps to provide the control plane
open Apache Kafka as the runtime you keep portable

That combination changes the decision materially. Strimzi handles the reconciliation layer. AxonOps handles the operational layer. Your team keeps control over versioning, networking, and deployment architecture, while also getting the observability and governance surface that would otherwise push you back toward a managed service. If you want the practical workflow behind that separation, Running Kafka at Scale Without a Platform Team walks through it directly.

This is also where the cost story becomes more favorable to self-hosting. The managed-service premium is the budget line you can redirect into proper tooling and still keep an open operating model. In other words, the comparison should not be “MSK versus raw Kafka binaries.” It should be “MSK or Confluent versus self-hosted Kafka with the control plane you actually want.”

Conclusions

Amazon MSK and Confluent Cloud both solve real problems. They reduce parts of the broker-management burden and can be sensible choices for teams that want to offload infrastructure responsibility quickly.

Even so, the case for self-hosting Kafka is stronger than it used to be. The infrastructure cost is often lower, the cost shape is more predictable, the lock-in risk is lower, and the operating model is now far more mature than it was when “managed Kafka” first became the default answer.

For teams that want open Apache Kafka, infrastructure portability, and a serious operational surface, self-hosted Kafka with Strimzi and AxonOps is now the strongest all-round option. It gives you the control of self-hosting without going back to the bad old days of fragmented dashboards, ad-hoc topic scripts, and blind incident response.

If you want to talk through your current Kafka operating model, contact us or book time with an AxonOps expert.

Demo Sandbox

Cassandra in 2025: A Year in Review