$ kinit admin@EXAMPLE-REALM.COM
This document applies to Apache Kudu version 1.17.1. Please consult the documentation of the appropriate release that’s applicable to the version of the Kudu cluster. |
Kudu includes security features which allow Kudu clusters to be hardened against access from unauthorized users. This guide describes the security features provided by Kudu. Configuring a Secure Kudu Cluster lists essential configuration options when deploying a secure Kudu cluster. Known Limitations contains a list of known deficiencies in Kudu’s security capabilities.
Kudu can be configured to enforce secure authentication among servers, and between clients and servers. Authentication prevents untrusted actors from gaining access to Kudu, and securely identifies the connecting user or services for authorization checks. Authentication in Kudu is designed to interoperate with other secure Hadoop components by utilizing Kerberos.
Authentication can be configured on Kudu servers using the
--rpc_authentication
flag, which can be set to required
, optional
, or
disabled
. By default, the flag is set to optional
. When required
, Kudu
will reject connections from clients and servers who lack authentication
credentials. When optional
, Kudu will attempt to use strong authentication.
When disabled
or strong authentication fails for 'optional', by default Kudu
will only allow unauthenticated connections from trusted subnets, which are
private networks (127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,
169.254.0.0/16) and local subnets of all local network interfaces. Unauthenticated
connections from publicly routable IPs will be rejected.
The trusted subnets can be configured using the --trusted_subnets
flag,
which can be set to IP blocks in CIDR notation separated by comma. Set it to
'0.0.0.0/0' to allow unauthenticated connections from all remote IP addresses.
However, if network access is not otherwise restricted by a firewall,
malicious users may be able to gain unauthorized access. This can be mitigated
if authentication is configured to be required.
When the --rpc_authentication flag is set to optional ,
the cluster does not prevent access from unauthenticated users. To secure a
cluster, use --rpc_authentication=required .
|
Kudu uses an internal PKI system to issue X.509 certificates to servers in the cluster. Connections between peers who have both obtained certificates will use TLS for authentication, which doesn’t require contacting the Kerberos KDC. These certificates are only used for internal communication among Kudu servers, and between Kudu clients and servers. The certificates are never presented in a public facing protocol.
By using internally-issued certificates, Kudu offers strong authentication which scales to huge clusters, and allows TLS encryption to be used without requiring you to manually deploy certificates on every node.
After authenticating to a secure cluster, the Kudu client will automatically request an authentication token from the Kudu master. An authentication token encapsulates the identity of the authenticated user and carries the master’s RSA signature so that its authenticity can be verified.
This token will be used to authenticate subsequent connections. By default, authentication tokens are only valid for seven days, so that even if a token were compromised, it could not be used indefinitely. For the most part, authentication tokens should be completely transparent to users. By using authentication tokens, Kudu takes advantage of strong authentication without paying the scalability cost of communicating with a central authority for every connection.
When used with distributed compute frameworks such as Spark, authentication tokens can simplify configuration and improve security. For example, the Kudu Spark connector will automatically retrieve an authentication token during the planning stage, and distribute the token to tasks. This allows Spark to work against a secured Kudu cluster where only the planner node has Kerberos credentials.
Users running client Kudu applications must first run the kinit
command to
obtain a Kerberos ticket-granting ticket. For example:
$ kinit admin@EXAMPLE-REALM.COM
Once authenticated, you use the same client code to read from and write to Kudu servers with and without Kerberos configuration.
Kudu authentication is designed to scale to thousands of nodes, which requires avoiding unnecessary coordination with a central authentication authority (such as the Kerberos KDC). Instead, Kudu servers and clients will use Kerberos to establish initial trust with the Kudu master, and then use alternate credentials for subsequent connections. In particular, the master will issue internal X.509 certificates to servers, and temporary authentication tokens to clients.
Kudu supports coarse-grained authorization of client requests based on the authenticated client Kerberos principal (i.e. user or service). The two levels of access which can be configured are:
Superuser - principals authorized as a superuser are able to perform
certain administrative functionality such as using the kudu
command line tool
to diagnose or repair cluster issues.
User - principals authorized as a user are able to access and modify all data in the Kudu cluster. This includes the ability to create, drop, and alter tables as well as read, insert, update, and delete data.
Internally, Kudu has a third access level for the daemons themselves. This ensures that users cannot connect to the cluster and pose as tablet servers. |
Access levels are granted using whitelist-style Access Control Lists (ACLs), one
for each of the two levels. Each access control list either specifies a
comma-separated list of users, or may be set to *
to indicate that all
authenticated users are able to gain access at the specified level. See
Configuring a Secure Kudu Cluster below for examples.
The default value for the User ACL is * , which allows all users access
to the cluster. However, if authentication is enabled, this still restricts access
to only those users who are able to successfully authenticate via Kerberos.
Unauthenticated users on the same network as the Kudu servers will be unable
to access the cluster.
|
As of Kudu 1.12.0, Kudu can be configured to enforce fine-grained authorization across servers. This ensures that users can see only the data they are explicitly authorized to see. Kudu supports this by leveraging policies defined in Apache Ranger 2.1 and later.
Fine-grained authorization policies are not enforced when accessing the web UI. User data may appear on various pages of the web UI (e.g. in logs, metrics, scans, etc.). As such, it is recommended to either limit access to the web UI ports, or redact or disable the web UI entirely, as desired. See the instructions for securing the web UI for more details. |
Apache Ranger models tabular objects stored in a Kudu cluster in the following hierarchy:
Ranger allows you to add separate service repositories to manage privileges
for different Kudu clusters. Depending on the value of the ranger.plugin.kudu.service.name
configuration in Ranger client, Kudu knows which service repository to connect
to. For more details about Ranger service repository, see the Apache Ranger
documentation.
|
Database - Kudu does not have the concept of a database. Therefore, a database
is indicated as a prefix of table names with the format <database>.<table>
.
Since Kudu’s only restriction on table names is that they be valid UTF-8 encoded
strings, Kudu considers special characters to be valid parts of database or table
names. For example, if a managed Kudu table created from Impala (see Kudu Impala
integration documentation) is named
impala::bar.foo
, its database will be impala::bar
.
Table - a single Kudu table.
Column - a column within a Kudu table.
In Ranger, privileges are also associated with specific actions. Access to Kudu tables may rely on privileges on the following actions:
ALTER
CREATE
DELETE
DROP
INSERT
UPDATE
SELECT
There are two additional access types:
ALL
METADATA
If a user has the ALL
privilege on a resource, they implicitly have privileges
to perform any action on that resource (except those that require users to be a
delegated admin, see below). Also, if a user is granted any privilege, they are
able to perform actions requiring METADATA
(e.g. opening the table) without
having to explicitly grant METADATA
privilege to them.
Finally, Ranger supports a delegate admin
flag which is independent of the
action types (it’s not implied by ALL
and doesn’t imply METADATA
). This is
similar to the GRANT OPTION
part of ALL WITH GRANT OPTION
in SQL as it is
required to modify privileges in Ranger and change the owner of a Kudu table.
A user with the delegate admin privilege on a resource can grant any
privilege to themselves and others.
|
While the action types are hierarchical, in terms of privilege evaluation,
Ranger doesn’t have the concept of hierarchy. For instance, if a user has
SELECT
privilege on a database, it does not imply that the user has SELECT
privileges on every table belonging to that database. On the other hand, Ranger
supports privilege wildcard matching. For example, db=a→table=*
matches all
the tables that belong to database a
. Therefore, in Ranger users actually need
the SELECT
privilege granted on db=a→table=*→column=*
to allow SELECT
on
every table and every column in database a
.
Nevertheless, with Ranger integration, when a Kudu master receives a request, it consults Ranger to determine what privileges a user has. And the required policies documented in the policy section are enforced to determine whether the user is authorized to perform the requested action or not.
Even though Kudu table names remain case sensitive with Ranger integration, policy authorization is considered case-insensitive. |
In addition to granting privileges to a user by username, privileges can also be
granted to table owners using the special {OWNER}
username. These policies are
evaluated only when a user tries to perform an action on a table that they own.
For example, a policy can be defined for the {OWNER}
user and db=→table=
resource, and it will automatically be applied when any table is accessed by its
owner. This way administrators don’t need to choose between creating policies
one by one for each table, and granting access to a wide range of users.
If a user has ALL and delegate admin privileges on a table only via
ownership and no privileges by username, they can effectively lock themselves
out by giving away ownership.
|
Rather than having every tablet server communicate directly with the underlying authorization service (Ranger), privileges are propagated and checked via authorization tokens. These tokens encapsulate what privileges a user has on a given table. Tokens are generated by the master and returned to Kudu clients upon opening a Kudu table. Kudu clients automatically attach authorization tokens when sending requests to tablet servers.
Authorization tokens are a means to limiting the number of nodes directly accessing the authorization service to retrieve privileges. As such, since the expected number of tablet servers in a cluster is much higher than the number of Kudu masters, they are only used to authorize requests sent to tablet servers. Kudu masters fetch privileges directly from the authorization service or cache. See Ranger Client Caching for more details of Kudu’s privilege cache. |
Similar to the validity interval for authentication tokens, to limit the window of potential unwanted access if a token becomes compromised, authorization tokens are valid for five minutes by default. The acquisition and renewal of a token is hidden from the user, as Kudu clients automatically retrieve new tokens when existing tokens expire.
When a tablet server that has been configured to enforce fine-grained access control receives a request, it checks the privileges in the attached token, rejecting it if the privileges are not sufficient to perform the requested operation, or if it is invalid (e.g. expired).
It may be desirable to allow certain users to view and modify any data stored
in Kudu. Such users can be specified via the --trusted_user_acl
master
configuration. Trusted users can perform any operation that would otherwise
require fine-grained privileges, without Kudu consulting the authorization service.
Additionally, some services that interact with Kudu may authorize requests on behalf of their end users. For example, Apache Impala authorizes queries on behalf of its users, and sends requests to Kudu as the Impala service user, commonly "impala". Since Impala authorizes requests on its own, to avoid extraneous communication between the authorization service and Kudu, the Impala service user should be listed as a trusted user.
When accessing Kudu through Impala, Impala enforces its own fine-grained authorization policy. This policy is similar to Kudu’s and can be found in Impala’s authorization documentation. |
Ranger is often configured with Kerberos authentication. See Configuring a Secure Kudu Cluster for how to configure Kudu to authenticate via Kerberos. |
After building Kudu from source, find the kudu-subprocess.jar
under the build
directory (e.g. build/release/bin
). Note its path, as it is the one to the
JAR file containing the Ranger subprocess, which houses the Ranger client that
Kudu will use to communicate with the Ranger server.
Use the kudu table list
tool to find any table names in the cluster that are
not Ranger-compatible, which are names that begin or end with a period. Also check
that there are no two table names that only differ by case, since authorization
is case-insensitive. For those tables that don’t comply with the requirements,
use the kudu table rename_table
tool to rename the tables.
Create Ranger client ranger-kudu-security.xml
configuration file, and note down
the directory containing this file.
<property>
<name>ranger.plugin.kudu.policy.cache.dir</name>
<value>policycache</value>
<description>Directory where Ranger policies are cached after successful retrieval from the Ranger service</description>
</property>
<property>
<name>ranger.plugin.kudu.service.name</name>
<value>kudu</value>
<description>Name of the Ranger service repository storing policies for this Kudu cluster</description>
</property>
<property>
<name>ranger.plugin.kudu.policy.rest.url</name>
<value>http://host:port</value>
<description>Ranger Admin URL</description>
</property>
<property>
<name>ranger.plugin.kudu.policy.source.impl</name>
<value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
<description>Ranger client implementation to retrieve policies from the Ranger service</description>
</property>
<property>
<name>ranger.plugin.kudu.policy.rest.ssl.config.file</name>
<value>ranger-kudu-policymgr-ssl.xml</value>
<description>Path to the file containing SSL details to connect Ranger Admin</description>
</property>
<property>
<name>ranger.plugin.kudu.policy.pollIntervalMs</name>
<value>30000</value>
<description>Ranger client policy polling interval</description>
</property>
When Secure Socket Layer (SSL) is enabled for Ranger Admin, add ranger-kudu-policymgr-ssl.xml
file to the Ranger client configuration directory with the following configurations:
<property>
<name>xasecure.policymgr.clientssl.keystore</name>
<value>[/path/to/keystore].jks</value>
<description>Java keystore files</description>
</property>
<property>
<name>xasecure.policymgr.clientssl.keystore.credential.file</name>
<value>jceks://file/[path/to/credentials].jceks</value>
<description>Java keystore credential file</description>
</property>
<property>
<name>xasecure.policymgr.clientssl.truststore</name>
<value>[/path/to/truststore].jks</value>
<description>Java truststore file</description>
</property>
<property>
<name>xasecure.policymgr.clientssl.truststore.credential.file</name>
<value>jceks://file/[path/to/credentials].jceks</value>
<description>Java truststore credential file</description>
</property>
Set the following configurations on the Kudu master:
# The path to directory containing Ranger client configuration. This example
# assumes the path is '/kudu/ranger-config'.
--ranger_config_path=/kudu/ranger-config
# The path where the Java binary was installed. This example assumes
# '$JAVA_HOME=/usr/local'
--ranger_java_path=/usr/local/bin/java
# The path to the JAR file containing the Ranger subprocess. This example
# assumes '$KUDU_HOME=/kudu'
--ranger_jar_path=/kudu/build/release/bin/kudu-subprocess.jar
# This example ACL setup allows the 'impala' user to access all data stored in
# Kudu, assuming Impala will authorize requests on its own. The 'kudu' user is
# also granted access to all Kudu data, which may facilitate testing and
# debugging (such as running the 'kudu cluster ksck' tool).
--trusted_user_acl=impala,kudu
Set the following configurations on the tablet servers:
--tserver_enforce_access_control=true
Add a Kudu service repository with the following configurations via the Ranger Admin web UI:
# This example setup configures the Kudu service user as a privileged user to be
# able to retrieve authorization policies stored in Ranger.
<property>
<name>policy.download.auth.users</name>
<value>kudu</value>
</property>
On the other hand, privilege cache in Kudu master is disabled with Ranger integration, since Ranger provides client side cache the use privileges and can periodically poll the privilege store for any changes. When a change is detected, the cache will be automatically updated.
Update the ranger.plugin.kudu.policy.pollIntervalMs property specified in
ranger-kudu-security.xml to set how often the Ranger client cache refreshes
the privileges from the Ranger service.
|
The following authorization policy is enforced by Kudu masters.
Operation | Required Privilege |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The following authorization policy is enforced by Kudu tablet servers.
Operation | Required Privilege |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
User must be configured in |
|
User must be configured in |
Unlike Impala, Kudu only supports all-or-nothing access to a table’s schema, rather than showing only authorized columns. |
Kudu allows all communications among servers and between clients and servers to be encrypted with TLS, and the data to be encrypted at rest with AES.
Encryption in transit can be configured on Kudu servers using the
--rpc_encryption
flag, which can be set to required
, optional
, or
disabled
. By default, the flag is set to optional
. When required
, Kudu
will reject unencrypted connections. When optional
, Kudu will attempt to use
encryption. Same as authentication, when disabled
or encryption fails for
optional
, Kudu will only allow unencrypted connections from trusted subnets
and reject any unencrypted connections from publicly routable IPs. To secure a
cluster, use --rpc_encryption=required
.
Kudu will automatically turn off encryption on local loopback connections, since traffic from these connections is never exposed externally. This allows locality-aware compute frameworks like Spark and Impala to avoid encryption overhead, while still ensuring data confidentiality. |
It’s also possible to encrypt data at rest. Kudu supports AES-128-CTR, AES-192-CTR, and AES-256-CTR ciphers to encrypt data. Each physical file is encrypted with a unique key (File Key), which in turn is encrypted with the server’s own key (Server Key), which is encrypted by the Cluster Key stored in a third-party Key Management Service (KMS). Kudu supports Apache Ranger KMS and Apache Hadoop KMS (they are API-compatible).
Encryption at rest can be enabled with the --encrypt_data_at_rest=true
flag.
As the default key provider is NOT secure (it stores the Server Keys in
cleartext and a Cluster Key is not used), the key provider should be set to
ranger-kms
using the encryption_key_provider
flag and its URL set with
ranger_kms_url
. Before starting the server, a key must exist in the key
provider with the same name as passed to Kudu with the
--encryption_cluster_key_name
flag.
When data is encrypted, CLI tools accessing the file system directly need to be
provided with the same flags and the instance file from a data, WAL, or metadata
directory must also be set with the --instance_file
flag, for example:
$ kudu wal dump --encrypt_data_at_rest=true --encryption_key_provider=ranger-kms \
--ranger_kms_url=https://ranger-kms.example.com:9292/kms \
--instance_file=/path/to/wal/instance \
/path/to/wal/wals/ffffffffffffffffffffffffffffffff/wal-000000001
Enabling data at rest encryption is supported only on fresh installations. When encryption is enabled and there are pre-existing Kudu directories, Kudu will fail to start. Disabling it on an existing cluster is also unsupported. Existing Kudu clusters can be migrated in-place by re-adding the existing servers as encrypted one by one, and waiting for the data to be fully replicated after each step to make sure there is no data loss. |
The Kudu web UI can be configured to use secure HTTPS encryption by providing each server with TLS certificates. See Configuring a Secure Kudu Cluster for more information on web UI HTTPS configuration.
To prevent sensitive data from being exposed in the web UI, all row data is
redacted. Table metadata, such as table names, column names, and partitioning
information is not redacted. The web UI can be completely disabled by setting
the --webserver_enabled=false
flag on Kudu servers.
Disabling the web UI will also disable REST endpoints such as
/metrics . Monitoring systems rely on these endpoints to gather metrics data.
|
To prevent sensitive data from being included in Kudu server logs, all row data
is redacted by default. By setting the --redact=log
flag, redaction will be
disabled in the web UI but retained for server logs. Alternatively, --redact=none
can be used to disable redaction completely.
The following configuration parameters should be set on all servers (master and tablet server) in order to ensure that a Kudu cluster is secure:
# Connection Security
# -------------------
--rpc_authentication=required
--rpc_encryption=required
--keytab_file=<path-to-kerberos-keytab>
# Web UI Security
# ---------------
--webserver_certificate_file=<path-to-cert-pem>
--webserver_private_key_file=<path-to-key-pem>
# optional
--webserver_private_key_password_cmd=<password-cmd>
# If you prefer to disable the web UI entirely:
--webserver_enabled=false
# Coarse-grained authorization
# ----------------------------
# This example ACL setup allows the 'impala' user as well as the
# 'nightly_etl_service_account' principal access to all data in the
# Kudu cluster. The 'hadoopadmin' user is allowed to use administrative
# tooling. Note that, by granting access to 'impala', other users
# may access data in Kudu via the Impala service subject to its own
# authorization rules.
--user_acl=impala,nightly_etl_service_account
--superuser_acl=hadoopadmin
# Data at rest encryption
# -----------------------
# This example data at rest encryption setup enables data at rest encryption for
# Kudu using Ranger KMS as the Cluster Key provider. The
# encryption_cluster_key_name is the default one, and if a key is created with
# this name in Ranger KMS, it can be omitted.
--encrypt_data_at_rest=true
--encryption_key_provider=ranger-kms
--encryption_cluster_key_name=kudu_cluster_key # optional
--ranger_kms_url=https://ranger-kms.example.com:9292/kms
See Configuring the Integration with Apache Ranger to see an example of how to enable fine-grained authorization via Apache Ranger.
Further information about these flags can be found in the configuration flag reference.
Kudu has a few known security limitations:
Kudu does not support externally-issued certificates for internal wire encryption (server to server and client to server).
Kudu does not have built-in on-disk encryption. However, Kudu can be used with whole-disk encryption tools such as dm-crypt.