Optimized joins & filtering with Bloom filter predicate in Kudu

Posted 15 Jan 2021 by Bankim Bhavsar

Note: This is a cross-post from the Cloudera Engineering Blog Optimized joins & filtering with Bloom filter predicate in Kudu

Cloudera’s CDP Runtime version 7.1.5 maps to Apache Kudu 1.13 and upcoming Apache Impala 4.0

Introduction

In database systems one of the most effective ways to improve performance is to avoid doing unnecessary work, such as network transfers and reading data from disk. One of the ways Apache Kudu achieves this is by supporting column predicates with scanners. Pushing down column predicate filters to Kudu allows for optimized execution by skipping reading column values for filtered out rows and reducing network IO between a client, like the distributed query engine Apache Impala, and Kudu. See the documentation on runtime filtering in Impala for details.

CDP Runtime 7.1.5 and CDP Public Cloud added support for Bloom filter column predicate pushdown in Kudu and the associated integration in Impala.

Apache Kudu 1.13.0 released

Posted 21 Sep 2020 by Attila Bukor

The Apache Kudu team is happy to announce the release of Kudu 1.13.0!

The new release adds several new features and improvements, including the following:

Fine-Grained Authorization with Apache Kudu and Apache Ranger

Posted 11 Aug 2020 by Attila Bukor

When Apache Kudu was first released in September 2016, it didn’t support any kind of authorization. Anyone who could access the cluster could do anything they wanted. To remedy this, coarse-grained authorization was added along with authentication in Kudu 1.3.0. This meant allowing only certain users to access Kudu, but those who were allowed access could still do whatever they wanted. The only way to achieve finer-grained access control was to limit access to Apache Impala where access control could be enforced by fine-grained policies in Apache Sentry. This method limited how Kudu could be accessed, so we saw a need to implement fine-grained access control in a way that wouldn’t limit access to Impala only.

Kudu 1.10.0 integrated with Apache Sentry to enable finer-grained authorization policies. This integration was rather short-lived as it was deprecated in Kudu 1.12.0 and will be completely removed in Kudu 1.13.0.

Most recently, since 1.12.0 Kudu supports fine-grained authorization by integrating with Apache Ranger 2.1 and later. In this post, we’ll cover how this works and how to set it up.

Apache Kudu 1.12.0 released

Posted 18 May 2020 by Hao Hao

The Apache Kudu team is happy to announce the release of Kudu 1.12.0!

The new release adds several new features and improvements, including the following: