$ kudu-master --help
This document applies to Apache Kudu version 1.16.0. Please consult the documentation of the appropriate release that’s applicable to the version of the Kudu cluster. |
To configure the behavior of each Kudu process, you can pass command-line flags when
you start it, or read those options from configuration files by passing them using
one or more --flagfile=<file>
options. You can even include the
--flagfile
option within your configuration file to include other files. Learn more about gflags
by reading its documentation.
You can place options for masters and tablet servers into the same configuration file, and each will ignore options that do not apply.
Flags can be prefixed with either one or two -
characters. This
documentation standardizes on two: --example_flag
.
Only the most common configuration options are documented here. For a more exhaustive list of configuration options, see the Configuration Reference.
To see all configuration flags for a given executable, run it with the --help
option.
Take care when configuring undocumented flags, as not every possible
configuration has been tested, and undocumented options are not guaranteed to be
maintained in future releases.
Kudu relies on timestamps generated by its clock implementation for the MVCC
and for providing consistency guarantees when processing write and read
requests. Aside from the test-only mock clock, Kudu has two different clock
implementations: one is based on logical time and the other is based on
so-called hybrid time. The former is a plain Lamport clock, the latter
is a combination of the node’s system clock and a Lamport clock. Below,
the former is referred to as LogicalClock
and the latter as HybridClock
.
Using the HybridClock
implementation is a must for any production-grade, POC,
and other regular Kudu deployments: that’s why --use_hybrid_clock
is set
true
by default. Setting the flag to false
makes Kudu servers use the
LogicalClock
implementation: running with such a clock implementation is
acceptable only in the context of running specifically crafted test scenarios
in Kudu development environment.
Setting --use_hybrid_clock=false is strongly discouraged in any
production-grade deployment since that could introduce out-of-control latency
and not-quite-expected behavior, especially when working with multiple tables
in a multi-node Kudu cluster.
|
To provide better accuracy for multi-node cluster deployments where each node
maintains its own system clock, the HybridClock
implementation requires each
node’s system clock to be synchronized by NTP.
Setting --time_source=system_unsync removes the requirement for the
node’s system clock to be synchronized by NTP — this allows users to run test
clusters on a single node where there is only one clock used by all Kudu
servers. Setting --time_source=system_unsync is strongly discouraged in any
multi-node Kudu cluster, unless system clocks of all Kudu nodes are guaranteed
to always be synchronized with each other.
|
For Kudu masters and tablet servers, there are two options to make the
HybridClock
implementation use a clock synchronized by NTP:
Ensure that the system clock of the Kudu node is synchronized with reference
servers using an NTP daemon running on the node. Usually, the NTP daemon is
a part of the node’s OS distribution. As of Kudu 1.12.0 and newer, both
ntpd
and chronyd
are supported. Prior Kudu versions were tested only
with ntpd
, but might work just fine with chronyd
as well if chronyd
is
configured as recommended by the
chronyd configuration tips for Kudu.
Make Kudu servers maintain their own local clock, synchronizing it with reference NTP servers. For that, Kudu servers use their built-in NTP client. This option is available in Kudu 1.11.0 and newer versions.
The latter option is provided as a last resort for deployments where properly
configuring NTP daemons at every node of a Kudu cluster is not feasible for
some reason and to simplify Kudu deployments in public cloud environments such
as EC2 and GCP. For on-prem deployments, it’s still recommended to use the
former option since the current implementation of the Kudu built-in NTP client
might not be as robust as the battle-tested ntpd
and chronyd
system
NTP daemons.
To switch between these two options above, use the --time_source
flag:
Setting --time_source=system
makes the HybridClock
rely on the node’s
system clock.
Setting --time_source=builtin
turns on the built-in NTP client in
Kudu masters and tablet servers. Use the --builtin_ntp_servers
flag to
customize the set of reference NTP servers for the built-in NTP client: the
value is expected to be a comma-separated list.
The default setting for the --builtin_ntp_servers flag might require
access to the NTP servers hosted by the
NTP Pool Project.
|
If deploying a Kudu cluster in AWS/EC2 or GCE/GCP public clouds, it might make
sense to set --time_source=auto
for all Kudu masters and tablet servers in
the cluster. In this context, setting --time_source=auto
leads to the
following:
Upon every start, a Kudu server runs the auto-detection procedure to determine the type of the cloud environment it runs at.
If the procedure of the cloud type auto-detection completes successfully, the Kudu server starts using its built-in NTP client to synchronize with the NTP server provided by the cloud environment (see the appropriate documentation for EC2 and GCP correspondingly).
Running a Kudu server with --time_source=auto in cloud environments
other than EC2 and GCP, or when the cloud type auto-detection fails, makes
the Kudu server fall back to using the built-in NTP client with the list
of NTP servers as specified by the --builtin_ntp_servers flag, unless it’s
empty or otherwise unparsable. When --builtin_ntp_servers is set to an empty
list and the cloud type auto-detection fails, the Kudu server runs as if it
were configured with the system time source if the OS/platform supports the
get_ntptime() API. Finally, the catch-all case is system_unsync for the
time source. As already mentioned, the system_unsync time source is targeted
for development-only platforms or single-node-runs-it-all proof-of-concept
Kudu clusters.
|
The kudu cluster ksck
CLI utility reports the configured and the effective
time source for every Kudu master and tablet server in a cluster. The list of
the NTP servers for the built-in client is reported as well when the effective
time source is builtin
. The utility is also able to show the difference in
settings of the related time source flags and warn operators if a discrepancy
is detected. In addition, the information on the configured and effective time
source is reported by the embedded Web server in the Time Source
panel at
the /config
page.
Changing the value of the --time_source flag implies restarting a Kudu
server. Keep the time source the same for all master and tablet servers in
a Kudu cluster. If using the built-in NTP Kudu client, make sure to use
the same list of reference NTP servers for every Kudu server in a cluster.
|
Every Kudu node requires the specification of directory flags. The
--fs_wal_dir
configuration indicates where Kudu will place its write-ahead
logs. The --fs_metadata_dir
configuration indicates where Kudu will place
metadata for each tablet. It is recommended, although not necessary, that these
directories be placed on a high-performance drives with high bandwidth and low
latency, e.g. solid-state drives. If --fs_metadata_dir
is not specified,
metadata will be placed in the directory specified by --fs_wal_dir
. Since
a Kudu node cannot tolerate the loss of its WAL or metadata directories, it
may be wise to mirror the drives containing these directories in order to
make recovering from a drive failure easier; however, mirroring may increase
the latency of Kudu writes.
The --fs_data_dirs
configuration indicates where Kudu will write its data
blocks. This is a comma-separated list of directories; if multiple values are
specified, data will be striped across the directories. If not specified, data
blocks will be placed in the directory specified by --fs_wal_dir
. Note that
while a single data directory backed by a RAID-0 array will outperform a single
data directory backed by a single storage device, it is better to let Kudu
manage its own striping over multiple devices rather than delegating the
striping to a RAID-0 array.
Additionally, --fs_wal_dir
and --fs_metadata_dir
may be the same as one
of the directories listed in --fs_data_dirs
, but must not be sub-directories
of any of them.
Each directory specified by a configuration flag on a given machine should be used by at most one Kudu process. If multiple Kudu processes on the same machine are configured to use the same directory, Kudu may refuse to start up. |
Once --fs_data_dirs is set, extra tooling is required to change it.
For more details, see the Kudu
Administration docs.
|
The --fs_wal_dir and --fs_metadata_dir configurations can be changed,
provided the contents of the directories are also moved to match the flags.
|
To see all available configuration options for the kudu-master
executable, run it
with the --help
option:
$ kudu-master --help
Flag | Valid Options | Default | Description |
---|---|---|---|
|
string |
|
Comma-separated list of all the RPC addresses for Master consensus-configuration. If not specified, assumes a standalone Master. |
|
string |
List of directories where the Master will place its data blocks. |
|
|
string |
The directory where the Master will place its tablet metadata. |
|
|
string |
The directory where the Master will place its write-ahead logs. |
|
|
string |
|
The directory to store Master log files. |
For the full list of flags for masters, see the Kudu Master Configuration Reference.
To see all available configuration options for the kudu-tserver
executable,
run it with the --help
option:
$ kudu-tserver --help
Flag | Valid Options | Default | Description |
---|---|---|---|
--fs_data_dirs |
string |
List of directories where the Tablet Server will place its data blocks. |
|
--fs_metadata_dir |
string |
The directory where the Tablet Server will place its tablet metadata. |
|
--fs_wal_dir |
string |
The directory where the Tablet Server will place its write-ahead logs. |
|
--log_dir |
string |
/tmp |
The directory to store Tablet Server log files |
--tserver_master_addrs |
string |
|
Comma separated addresses of the masters which the tablet server should connect to. The masters do not read this flag. |
--block_cache_capacity_mb |
integer |
512 |
Maximum amount of memory allocated to the Kudu Tablet Server’s block cache. |
--memory_limit_hard_bytes |
integer |
4294967296 |
Maximum amount of memory a Tablet Server can consume before it starts rejecting all incoming writes. |
For the full list of flags for tablet servers, see the Kudu Tablet Server Configuration Reference.
Kudu allows certain configurations to be set per table. To configure the behavior of a Kudu table, you can set these configurations at table creation, or alter them via the Kudu API or Kudu command line tool.
Configuration | Valid Options | Default | Description |
---|---|---|---|
kudu.table.history_max_age_sec |
integer |
Number of seconds to retain history for tablets in this table. |
|
kudu.table.maintenance_priority |
integer |
0 |
Priority level of a table for maintenance. |
kudu.table.disable_compaction |
false, true |
false |
Whether to disable data compaction maintenance tasks for all tablets of this table. |