ApacheCon North America 2022 Highlights

Our highlights from attending ApacheCon 2022

I returned from New Orleans, Louisiana (NOLA) a few days ago. I came back with a nasty cold that did not help with the jet lag but still, I enjoyed ApacheCon North America 2022. The event was a bit smaller than I anticipated in terms of attendance, but the talks were top quality.

One of the most important things for me was meeting face-to-face some of the most talented developers contributing to the Apache Foundation projects and being able to chat one to one with them. It was great to see the Apache community thriving!

Keynotes

Amongst the best talks, as you would expect, were the Keynotes. I particularly enjoyed listening to David Nalley, President of the Apache Software Foundation, talking about dealing with the discovery of the log4shell vulnerability in 2021. Or as he described it, when the world realised Open Source was everywhere.

Another one I thoroughly enjoyed was Security and Performance Implications of QUIC by Paul Vixie. We have started to experiment with using QUIC on some of our internal projects, and to be honest, we vastly underestimated the security implications.

What’s hot

There are so many fantastic Apache products nowadays it’s difficult to determine what’s hot as everyone has different interests and requirements. My focus was on databases and data streaming, where Cassandra, Spark, Kafka and Pulsar stood up.

Cassandra: it’s still the world’s best distributed database. If you don’t believe me, just ask Apple, Netflix, Bloomberg and others why they’re running thousands of nodes successfully.
Spark: several talks about Apache Spark, it’s still very popular

Kafka: everyone seems to be running Apache Kafka. Used over 80% of the Fortune 100. Enough said.
Pulsar: if you’re not using Kafka yet, check out Pulsar before you decide. You may change your mind.
KEDA: because I’ve been using KEDA to autoscale Apache Kafka consumers and producers, I went to see Daniel Oh talking about it, and it was pretty good. Kubernetes + Kafka + KEDA: a great combination.
Stargate: there was an excellent talk about Stargate v2.0 implementation. I haven’t yet seen many companies using it, but it’s worth looking into if you want to improve access to your cluster and facilitate your developer’s access to the data.
Kubernetes: not part of the ApacheCon but present in many ways. There is even more appetite for running everything in Kubernetes.

Cassandra Accord

This was probably the most exciting announcement by the Cassandra community in a long time! Patrick McFadin bluntly announced that he was working on adding ACID support for Apache Cassandra.

ACID stands for atomicity, consistency, isolation, and durability

Atomicity. In a transaction involving two or more discrete pieces of information, either all pieces are committed, or none are.

Consistency. A transaction either creates a new and valid state of data or, if any failure occurs, returns all data to its state before the transaction was started.

Isolation. A transaction in process and not yet committed must remain isolated from any other transaction.

Durability. Committed data is saved by the system such that, even in the event of a failure and system restart, the data is available in its correct state.

These are some major news with massive repercussions. Currently, Cassandra is not ACID compliant.

Current versions of Cassandra do not have a way to perform complex RDBMS-style ACID transactions with locking and rollbacks, instead Cassandra has a tunable consistency model where the user can decide how strong or eventual the consistency needs to be for each read and write. Lightweight transactions allow users to perform simple compare-and-set operations but these have a limited scope and cannot be used to perform complex operations with the commit/rollback pattern provided by RDMBSes.

But with the introduction of the new Accord Consensus Protocol, Apache Cassandra would become ACID compliant. This will open the floodgates for many more industries that would be able to adopt Apache Cassandra. For example, the financial sector.

Our contribution

Whilst at KubeCon earlier this year, I also attended the Data on Kubernetes Day. There, I talked to many people trying to run their databases in Kubernetes. Postgres and MySQL were probably the most talked about, but Cassandra was also featured.

I realised then that whilst most people were discussing operators and setups, no one was talking about the elephant in the room: the storage.

That’s why I submitted a talk to discuss the importance of selecting the right storage for your databases relative to performance and cost.

Hayato and I spent weeks researching and testing the storage of the three major players, Google, AWS and Azure and put a presentation together. We titled it “Storage considerations when running Apache Cassandra on Kubernetes with k8ssandra”.

We had our 45 minutes of fame on day 4 as part of the Cassandra track. Being the last day of ApacheCon and nearly the last talk, we didn’t fill the room, but we had very experienced Cassandra hands from large and small companies such as Netflix, Apple and Bloomberg.

Despite a glitch with my laptop that decided it would not open the presentation in full screen, it went well. My reading of the room was that the audience appreciated our findings and were as surprised as us when we researched the cost implications of running a large database in the public cloud.

Our talk had two parts. First, we discussed the pros and cons of running Apache Cassandra in Kubernetes. For the second part, we showed our findings from benchmarking the different remote storages provided by the public cloud providers AWS, GCP and Azure.

Our methodology was quite simple: we built a Kubernetes cluster on each of the cloud providers and deployed Apache Cassandra using K8ssandra integrated with AxonOps for running repairs and monitoring. We then injected load to it with NOSQLbench to obtain some good metrics.

AxonOps provided all the information we required for the comparison. We could look at all sorts of metrics, but we focused on IOPS, Read and Writes per second and Disk I/O.

Our presentation

You can download the slides to our presentation “Storage considerations when running Apache Cassandra on Kubernetes with k8ssandra” from ApacheCon’s website: https://apachecon.com/acna2022/slides/04_Rua_Storage-Considerations-When.pdf

Final words

It was great to meet the Apache community, especially the Apache Cassandra guys doing a fantastic job. Overall I think ApacheCon North America was a success, and I’d like to thank all the organizers for their hard work and commitment to the community.

I hope to see many of you in San Jose, California, for the Cassandra Summit 2023!

Get started with AxonOps

← Previous Post Next Post →

ApacheCon North America 2022 Highlights

Our highlights from attending ApacheCon 2022

Keynotes

What’s hot

Cassandra Accord

Our contribution

Our presentation

Final words

Latest Articles

Apache Cassandra™ 5: The features that really count.

Monitoring Cassandra: The Cost of Collecting Metrics

AxonOps Review – An Operations Platform for Apache Cassandra

Legals

Quick Links

Contact

124 City Road, London, EC1V 2NX

+44(0)203 603 6250

[email protected]