Since the Apache Kudu project made its debut in 2015, there have been a few common questions that kept coming up at every presentation:
- Is Kudu an open source version of Google’s Spanner system?
- Is Kudu NoSQL or SQL?
- Why does Kudu have a relational data model? Isn’t SQL dead?
A few of these questions are addressed in the Kudu FAQ, but I thought they were interesting enough that I decided to give a talk on these subjects at Strata Data Conference NYC 2017.
Preparing this talk was particularly interesting, since Google recently released Spanner to the public in SaaS form as Google Cloud Spanner. This meant that I was able to compare Kudu vs Spanner not just qualitatively based on some academic papers, but quantitatively as well.
To summarize the key points of the presentation:
-
Despite the growing popularity of “NoSQL” from 2009 through 2013, SQL has once again become the access mechanism of choice for the majority of analytic applications. NoSQL has become “Not Only SQL”.
-
Spanner and Kudu share a lot of common features. However:
-
Spanner offers a superior feature set and performance for Online Transactional Processing (OLTP) workloads, including ACID transactions and secondary indexing.
-
Kudu offers a superior feature set and performance for Online Analytical Processing (OLAP) and Hybrid Transactional/Analytic Processing (HTAP) workloads, including more complete SQL support and orders of magnitude better performance on large queries.
-
For more details and for the full benchmark numbers, check out the slide deck below:
Questions or comments? Join the Apache Kudu Community to discuss.