Kudu is primarily designed for analytic use cases. You are likely to encounter issues if a single row contains multiple kilobytes of data.
The columns which make up the primary key must be listed first in the schema.
Columns that are part of the primary key cannot be renamed. The primary key may not be changed after the table is created. You must drop and recreate a table to select a new primary key or rename key columns.
The primary key of a row may not be modified using the UPDATE
functionality.
To modify a row’s primary key, the row must be deleted and re-inserted with
the modified key. Such a modification is non-atomic.
Columns with DOUBLE
, FLOAT
, or BOOL
types are not allowed as part of a
primary key definition. Additionally, all columns that are part of a primary
key definition must be NOT NULL
.
Type and nullability of existing columns cannot be changed by altering the table.
Dropping a column does not immediately reclaim space. Compaction must run first. There is no way to run compaction manually, but dropping the table will reclaim the space immediately.
Tables must be manually pre-split into tablets using simple or compound primary keys. Automatic splitting is not yet possible. Range partitions may be added or dropped after a table has been created. See Schema Design for more information.
Data in existing tables cannot currently be automatically repartitioned. As a workaround, create a new table with the new partitioning and insert the contents of the old table.
Kudu does not currently include any built-in features for backup and restore. Users are encouraged to use tools such as Spark or Impala to export or import tables as necessary.
Updates, inserts, and deletes via Impala are non-transactional. If a query fails part of the way through, its partial effects will not be rolled back.
No timestamp and decimal type support.
The maximum parallelism of a single query is limited to the number of tablets in a table. For good analytic performance, aim for 10 or more tablets per host or use large tables.
Authorization is only available at a system-wide, coarse-grained level. Table-level, column-level, and row-level authorization features are not available.
Data encryption at rest is not built in. Kudu has been reported to run correctly
on systems using local block device encryption (e.g. dmcrypt
).
Kudu server Kerberos principals must follow the pattern kudu/<HOST>@DEFAULT.REALM
.
Configuring an alternate Kerberos principal is not supported.
Kudu’s integration with Apache Flume does not support writing to Kudu clusters that require Kerberos authentication.
Kudu client instances retrieve authentication tokens upon first contact with the cluster. These tokens expire after one week. Use of a single Kudu client instance for more than one week is not supported.
The following are known bugs and issues with the current release of Kudu. They will be addressed in later releases. Note that this list is not exhaustive, and is meant to communicate only the most important known issues.
If the Kudu master is configured with the -log_force_fsync_all
option, tablet servers
and clients will experience frequent timeouts, and the cluster may become unusable.
If a tablet server has a very large number of tablets, it may take several minutes to start up. It is recommended to limit the number of tablets per server to 100 or fewer. Consider this limitation when pre-splitting your tables. If you notice slow start-up times, you can monitor the number of tablets per server in the web UI.