Apache Kudu 1.11.1 is a bug-fix release which fixes one critical licensing issue in Kudu 1.11.0.
When upgrading from earlier versions of Kudu, if support for Kudu’s NVM
(non-volatile memory) block cache is desired, install the memkind
library
of version 1.8.0 or newer as documented in Kudu
Installation for corresponding platform. This is a mandatory step for existing
users of the NVM block cache (i.e. those who set --block_cache_type=NVM
for kudu-master
and kudu-tserver
): they must install memkind
, otherwise
their Kudu processes will crash at startup.
Fixed an issue with distributing libnuma
dynamic library with kudu-binary
JAR artifact. Also, fixed the issue of statically compiling in libnuma.a
into kudu-master
and kudu-tserver
binaries when building Kudu
from source in release mode. The fix removes both numactl
and memkind
projects from Kudu’s thirdparty dependencies and makes the dependency on the
libmemkind
library optional, opening the library using dlopen()
and
resolving required symbols via dlsym()
(see KUDU-2990).
Fixed an issue with kudu cluster rebalancer
CLI tool crashing when running
against a location-aware cluster if a tablet server in one location doesn’t
contain a single tablet replica
(see KUDU-2987).
Fixed an issue with connection negotiation using SASL mechanism when server FQDN is longer than 64 characters (see KUDU-2989).
Fixed an issue in the test harness of the kudu-binary JAR artifact. With this
fix, kudu-master
and kudu-tserver
processes of the mini-cluster’s test
harness no longer rely on the test NTP server to synchronize their built-in
NTP client. Instead, the test harness relies on the local machine clock
synchronized by the system NTP daemon
(see KUDU-2994).
Since KUDU-2625 is addressed, tablet servers now reject individual write operations which violate schema constraints in a batch of write operations. In prior versions the behavior was to reject the whole batch of write operations if a violation of the schema constraints is detected even for a single row. It’s recommended to revise applications which relied on the behavior mentioned above upon upgrading to Kudu 1.11.0.
The Kudu Flume integration is deprecated and may be removed in the next minor release. The integration will be moved to the Apache Flume project going forward (see FLUME-3345).
Kudu now supports putting tablet servers into maintenance. While in this
mode, the tablet server’s replicas will not be re-replicated if it fails.
Only upon exiting maintenance will re-replication be triggered for any
remaining under-replicated tablets. The kudu tserver state enter_maintenance
and kudu tserver state exit_maintenance
tools are added to orchestrate
tablet server maintenance, and the kudu tserver list
tool is amended with
a "state" column option to display current state of each tablet server
(see KUDU-2069).
Kudu now has a built-in NTP client which maintains the internal wallclock
time used for generation of HybridTime timestamps. When enabled, system clock
synchronization for nodes running Kudu is no longer necessary. This is useful
for containerized deployments and in other cases when it’s troublesome
to maintain properly configured system NTP service at each node of a Kudu
cluster. The list of NTP servers to synchronize against is specified with the
--builtin_ntp_servers
flag. By default, Kudu masters and tablet servers use
public servers hosted by the NTP Pool project. To use the built-in NTP
client, set --time_source=builtin
and reconfigure --builtin_ntp_servers
if necessary
(see KUDU-2935).
Aggregated table statistics are now available to Kudu clients via
KuduClient.getTableStatistics()
and KuduTable.getTableStatistics()
methods in the Kudu Java client and KuduClient.GetTableStatistics()
in the Kudu C++ client. This allows for various query optimizations.
For example, Spark now uses it to perform join optimizations.
The statistics are available via the API of both C++ and Java Kudu clients.
In addition, per-table statistics are available via kudu table statistics
CLI tool. The statistics are also available via master’s Web UI at
master:8051/metrics
and master:8051/table?id=<uuid>
URIs
(see KUDU-2797 and
KUDU-2921).
The kudu
CLI tool now supports altering table columns. Use the newly
introduced sub-commands such as kudu table column_set_default
,
kudu table column_remove_default
, kudu table column_set_compression
,
kudu table column_set_encoding
, and kudu table column_set_block_size
to alter a column of the specified table.
The kudu
CLI tool now supports dropping table columns. Use the newly
introduced kudu table delete_column
sub-command to drop a column of the
specified table.
The kudu
CLI tool now supports getting and setting extra
configuration properties for a table. Use kudu table get_extra_configs
and kudu table set_extra_config
sub-commands to perform the corresponding
operations
(see KUDU-2514).
The kudu
CLI tool now supports creating and dropping range partitions
for a table. Use kudu table add_range_partition
and
kudu table drop_range_partition
sub-commands to perform the corresponding
operations
(see KUDU-2881).
The kudu fs dump uuid
CLI tool is now significantly faster and consumes
significantly less IO.
The memory consumed by CFileReaders and BloomFileReaders is factored out and accounted separately by the tablet server memory tracking. The stats are available via Web UI as "CFileReaders" and "BloomFileReaders" entries.
KuduScanBatch::const_iterator
in Kudu C++ client now supports
operator→()
(see KUDU-1561).
Master server Web UI now supports sorting the list of tables by the columns of "Table Name", "Create Time", and "Last Alter Time".
Tablet servers now expand a tablet’s data directory group with available healthy directories when all directories of the group are full (see KUDU-2907).
For scan operations run with CLOSEST_REPLICA
selection mode, the Kudu Java
client now picks a random available replica in case no replica is located at
the same node with the client that initiated the scan operation. This helps
to spread the load generated by multiple scan requests to the same tablet
among all available replicas. In prior releases, all such scan requests might
end up fetching data from the same tablet replica
(see KUDU-2348).
The serialization of in-memory rows to Kudu’s wire format has been optimized to be more CPU efficient (see KUDU-2847).
Tablet servers and masters can now aggregate metrics by the same attribute.
For example, it’s now possible to fetch aggregated metrics from a tablet
server by retrieving data from URLs of form
http://<host>:<port>/metrics?merge_rules=tablet|table|table_name
Introduced Docker image for Python Kudu client (see KUDU-2849).
Tablet servers now consider available disk space when choosing a set of data directories for a tablet’s data directory group, and when deciding in which data directory a new block should be written (see KUDU-2901).
Added a quick-start example of using Apache Spark to load, query, and modify a real data set stored in Kudu.
Added a quick-start example of using Apache Nifi to ingest data into Kudu.
Tablet servers now reject individual write operations which violate schema constraints in a batch of write operations received from a client. The previous behavior was to reject the whole batch of write operations if a violation of the schema constraints is detected even for a single row (see KUDU-2625).
Tablet replicas can now be optionally placed in accordance with a
dimension-based placement policy. To specify a dimension label for a table,
use the KuduTableCreator::dimension_label()
and
CreateTableOptions.setDimensionLabel()
methods of the C++ and Java Kudu
clients. To add a partition with a dimension label, use the
KuduTableAlterer::AddRangePartitionWithDimension()
and
AlterTableOptions.addRangePartition()
methods of the C++ and Java Kudu
clients
(see KUDU-2823).
Kudu RPC now enables TCP keepalive for all outbound connections for faster detection of no-longer-reachable nodes (see KUDU-2192).
The kudu table scan
and kudu table copy
CLI tools now fail gracefully
rather than crashing upon hitting an error
(see KUDU-2851).
Optimized decoding of deltas' timestamps (see KUDU-2867).
Optimized the initialization of DeltaMemStore for the case when no matching deltas are present (see KUDU-2381).
Improved the rehydration of scan tokens. Now a scan token created before renaming a column can be used even after the column has been renamed.
The memory reserved by tcmalloc is now released to OS periodically to avoid potential OOM issues in the case of read-only workloads (see KUDU-2836).
Optimized evaluation of predicates on columns of primitive types and
NULL
/NOT NULL
predicates to leverage SIMD instructions
(see KUDU-2846).
Fixed an issue of fault-tolerant scan operation failing for a projection with key columns specified in other than the table schema’s order (see KUDU-2980).
Fixed an issue that would cause frequent leader elections in case when persisting Raft transactions to the WAL took longer than the leader election timeout. The issue was contributing to election storms (see KUDU-2947).
Fixed a tablet server crash in cases where blocks were not removed due to IO error. This issue may have surfaced after recovering from a disk failure (see KUDU-2635).
Fixed a crash in master and tablet server by validating the size of default
values when de-serializing ColumnSchemaPB
(see KUDU-2622).
Fixed RPC negotiation failure in the case when TLS v1.3 is supported at both the client and the server side. This is a temporary workaround before the connection negotiation code is properly updated to support 1.5-RTT handshake used in TLS v1.3. The issue affected Linux distributions shipped or updated with OpenSSL version 1.0.2 and newer (see KUDU-2871).
Fixed a race between GetTabletLocations()
and tablet report processing.
The race could crash the Kudu master
(see KUDU-2842).
Fixed a bug in AlterSchemaTransactionState::ToString()
that led to a crash
of tablet server when removing a tablet replica with a pending AlterSchema
transaction.
Kudu 1.11.0 is wire-compatible with previous versions of Kudu:
Kudu 1.11 clients may connect to servers running Kudu 1.0 or later. If the client uses features that are not available on the target server, an error will be returned.
Rolling upgrade between Kudu 1.10 and Kudu 1.11 servers is believed to be possible though has not been sufficiently tested. Users are encouraged to shut down all nodes in the cluster, upgrade the software, and then restart the daemons on the new version.
Kudu 1.0 clients may connect to servers running Kudu 1.11 with the exception of the below-mentioned restrictions regarding secure clusters.
The authentication features introduced in Kudu 1.3 place the following limitations on wire compatibility between Kudu 1.11 and versions earlier than 1.3:
If a Kudu 1.11 cluster is configured with authentication or encryption set to "required", clients older than Kudu 1.3 will be unable to connect.
If a Kudu 1.11 cluster is configured with authentication and encryption set to "optional" or "disabled", older clients will still be able to connect.
The Kudu 1.11 Java client library is API- and ABI-compatible with Kudu 1.10. Applications written against Kudu 1.10 will compile and run against the Kudu 1.11 client library and vice-versa.
The Kudu 1.11 C++ client is API- and ABI-forward-compatible with Kudu 1.10. Applications written and compiled against the Kudu 1.10 client library will run without modification against the Kudu 1.11 client library. Applications written and compiled against the Kudu 1.11 client library will run without modification against the Kudu 1.10 client library.
The Kudu 1.11 Python client is API-compatible with Kudu 1.10. Applications written against Kudu 1.10 will continue to run against the Kudu 1.11 client and vice-versa.
Please refer to the Known Issues and Limitations section of the documentation.
Kudu 1.11 includes contributions from 24 people, including 8 first-time contributors:
Hannah Nguyen
lingbin
Ritwik Yadav
Scott Reynolds
Volodymyr Verovkin
Xiaokai Wang
Xin He
Yao Wang
Thank you for your help in making Kudu even better!
For full installation details, see Kudu Installation.