The recently released Kudu version 0.8 ships with a host of new improvements to scan predicates. Performance and usability have been improved, especially for tables taking advantage of advanced partitioning options.
Scan Optimizations in the Server and C++ Client
The server and C++ client have gotten more sophisticated in how they handle and optimize scan constraints. Constraints specified in the predicates and lower and upper bound primary keys are better unified, resulting in more predicates being pushed into primary key bounds, which can turn full table scans with predicates into much more efficient bounded scans.
Additionally, the server and C++ client now recognize more opportunities to prune entire tablets during scans. For example, for the following schema and query Kudu will now be able to skip scanning 15 out of the 16 tablets in the table:
For a deeper look at the newly implemented scan and partition pruning optimizations, see the associated design document. These optimizations will eventually be incorporated into the Java client as well, but until that time they are still used on the server side for scans initiated by Java clients. If you would like to help with this effort, let us know on the JIRA issue.
Redesigned Predicate API in the Java Client
The Java client has a new way to express scan predicates: the
KuduPredicate
.
The API matches the corresponding C++ API more closely, and adds support for
specifying exclusive, as well as inclusive, range predicates. The existing
ColumnRangePredicate
API has been deprecated, and will be removed soon. Example of transitioning from
the old to new API:
Under the Covers Changes
The scan optimizations in the server and C++ client, and the new KuduPredicate
API in the Java client are made possible by an overhaul of how predicates are
handled internally. A new protobuf message type,
ColumnPredicatePB
has been introduced, and will allow more column predicate types to be introduced
in the future. If you are interested in contributing to Kudu but don’t know
where to start, consider adding a new predicate type; for example the IS NULL
,
IS NOT NULL
, IN
, and LIKE
predicates types are currently not implemented.