Flume 1.8+ requires Java 8 at runtime even though the Kudu Flume integration is Java 7 compatible. Flume 1.9 is the default dependency version as of Kudu 1.9.0.
Hadoop 3.0+ requires Java 8 at runtime even though the Kudu Hadoop integration is Java 7 compatible. Hadoop 3.2 is the default dependency version as of Kudu 1.9.0.
Support for Java 7 has been deprecated since Kudu 1.5.0 and may be removed in the next major release.
Kudu now supports location awareness. When configured, Kudu will make a best
effort to avoid placing a majority of replicas for a given tablet at the same
location. The kudu cluster rebalance
tool has been updated to act in
accordance with the placement policy of a location-aware Kudu. The
administrative
documentation has been updated to detail the usage of this feature.
Docker scripts have been introduced to build and run Kudu on various operating
systems. See the /docker
subdirectory of the source repository for more
details. An official repository has
been created for Apache Kudu Docker artifacts.
Developers integrating with Kudu can now write Java tests that start a Kudu mini cluster without having to first locally build and install Kudu. This is made possible by the Kudu team providing platform-specific binaries available to Gradle or Maven for download and install at test time. More information on this feature can be found here. This binary test artifact is currently considered to be experimental.
When creating a table, the master now enforces a restriction on the total
number of replicas rather than the total number of partitions. If manually
overriding --max_create_tablets_per_ts
, the maximum size of a new table
has effectively been cut by a factor of its replication factor. Note that
partitions can still be added after table creation.
The compaction policy has been updated to favor reducing the number of rowsets. This can lead to faster scans and lower bootup times, particularly in the face of a “trickling inserts” workload, where rows are inserted slowly in primary key order (see KUDU-1400).
A tablet-level metric average_diskrowset_height
has been added to indicate
how much a replica needs to be compacted, as indicated by the average number
of rowsets per unit of keyspace.
Scans which read multiple columns of tables undergoing a heavy UPDATE
workload are now more CPU efficient. In some cases, scan performance of such
tables may be several times faster upon upgrading to this release.
Kudu-Spark users can now provide the short “kudu” format alias to Spark. This
enables using .format(“kudu”)
in places where you would have needed to
provide the fully qualified name like .format(“org.apache.kudu.spark.kudu")
or imported org.apache.kudu.spark.kudu._
and used the implicit .kudu
functions. The
Spark
integration documentation has been updated to reflect this improvement.
The KuduSink
class has been added to the Spark integration as a
StreamSinkProvider
, allowing structured streaming writes into Kudu (see
KUDU-2640).
The amount of server-side logging has been greatly reduced for Kudu’s consensus implementation and background processes. This logging was determined to be not useful and unnecessarily verbose.
The web UI now more obviously depicts which columns are a part of the primary key (see KUDU-2477).
The kudu table describe
tool has been added to support describing table
attributes, including schema, partitioning, replication factor, column
encodings, compressions, and default values.
The kudu table scan
tool has been added to scan rows from a table,
supporting comparison, in-list, and is-null predicates.
The kudu locate_row
tool has been added to allow users to determine what
tablet a given primary key belongs to, and whether a row exists for that
primary key.
The kudu diagnose dump_mem_trackers
tool is added to allow users to output
the contents of the /mem-trackers
web UI page in a CSV format.
To avoid glitches and undefined behavior, the Kudu Python client now detects and reports on conflicting/incorrect initialization of the OpenSSL library.
Fixed a crash caused by a race between altering tablet schemas and deleting tablet replicas (see KUDU-1678).
Fixed an issue that would prevent the kudu fs update_dirs
tool from
removing directories in the presence of tablet tombstones (see
KUDU-2680).
The --cmeta_force_fsync
flag may be used to fsync Kudu’s consensus
metadata more aggressively. Setting this to true
may decrease Kudu’s
performance, but improve its durability in the face of power failures and
forced shutdowns (see
KUDU-2195).
Fixed an issue that would cause an excessive amount of RPC traffic from Kudu masters if the tablet servers were configured with duplicated master addresses (see KUDU-2684).
Fixed an issue that would cause the kudu cluster rebalance
tool to run
indefinitely in the case of tables with a replication factor of 2 (see
KUDU-2688).
Fixed an issue that could lead to a failure to bootstrap tablet replicas that were a part of workloads with many alter table operations (see KUDU-2690).
Fixed an issue with the Java scanner’s keepAlive
that could lead to a
permanent hang in the scanner (see
KUDU-2710).
Fixed an issue that would cause undefined behavior upon connecting to a secure cluster concurrently from multiple C++ clients (see KUDU-2706).
Kudu 1.9.0 is wire-compatible with previous versions of Kudu:
Kudu 1.9 clients may connect to servers running Kudu 1.0 or later. If the client uses features that are not available on the target server, an error will be returned.
Rolling upgrade between Kudu 1.8 and Kudu 1.9 servers is believed to be possible though has not been sufficiently tested. Users are encouraged to shut down all nodes in the cluster, upgrade the software, and then restart the daemons on the new version.
Kudu 1.0 clients may connect to servers running Kudu 1.9 with the exception of the below-mentioned restrictions regarding secure clusters.
The authentication features introduced in Kudu 1.3 place the following limitations on wire compatibility between Kudu 1.9 and versions earlier than 1.3:
If a Kudu 1.9 cluster is configured with authentication or encryption set to "required", clients older than Kudu 1.3 will be unable to connect.
If a Kudu 1.9 cluster is configured with authentication and encryption set to "optional" or "disabled", older clients will still be able to connect.
The Kudu 1.9 Java client library is API- and ABI-compatible with Kudu 1.8. Applications written against Kudu 1.8 will compile and run against the Kudu 1.9 client library and vice-versa.
The Kudu 1.9 C++ client is API- and ABI-forward-compatible with Kudu 1.8. Applications written and compiled against the Kudu 1.8 client library will run without modification against the Kudu 1.9 client library. Applications written and compiled against the Kudu 1.9 client library will run without modification against the Kudu 1.8 client library.
The Kudu 1.9 Python client is API-compatible with Kudu 1.8. Applications written against Kudu 1.8 will continue to run against the Kudu 1.9 client and vice-versa.
Please refer to the Known Issues and Limitations section of the documentation.
Kudu 1.9 includes contributions from 24 people, including 5 first-time contributors:
Bankim Bhavsar
Mike Parker
Mitch Barnett
Tim Armstrong
Yingchun Lai
Thank you for your help in making Kudu even better!
For full installation details, see Kudu Installation.