Apache Kudu - Developing Applications With Apache Kudu (incubating)

Developing Applications With Apache Kudu (incubating)

Kudu provides C++ and Java client APIs, as well as reference examples to illustrate their use. A Python API is included, but it is currently considered experimental, unstable, and subject to change at any time.

Use of server-side or private interfaces is not supported, and interfaces which are not part of public APIs have no stability guarantees.

Viewing the API Documentation

C++ API Documentation

The documentation for the C++ client APIs is included in the header files in /usr/include/kudu/ if you installed Kudu using packages or subdirectories of src/kudu/client/ if you built Kudu from source. If you installed Kudu using parcels, no headers are included in your installation. and you will need to build Kudu from source in order to have access to the headers and shared libraries.

The following command is a naive approach to finding relevant header files. Use of any APIs other than the client APIs is unsupported.

$ find /usr/include/kudu -type f -name *.h

Java API Documentation

You can view the Java API documentation online. Alternatively, after building the Java client, Java API documentation is available in java/kudu-client/target/apidocs/index.html.

Working Examples

Several example applications are provided in the kudu-examples Github repository. Each example includes a README that shows how to compile and run it. These examples illustrate correct usage of the Kudu APIs, as well as how to set up a virtual machine to run Kudu. The following list includes some of the examples that are available today. Check the repository itself in case this list goes out of date.

java-example: A simple Java application which connects to a Kudu instance, creates a table, writes data to it, then drops the table.
collectl: A small Java application which listens on a TCP socket for time series data corresponding to the Collectl wire protocol. The commonly-available collectl tool can be used to send example data to the server.
clients/python: An experimental Python client for Kudu.
demo-vm-setup: Scripts to download and run a VirtualBox virtual machine with Kudu already installed. See Quickstart for more information.

These examples should serve as helpful starting points for your own Kudu applications and integrations.

Maven Artifacts

The following Maven <dependency> element is valid for the Kudu public beta:

<dependency>
  <groupId>org.kududb</groupId>
  <artifactId>kudu-client</artifactId>
  <version>0.5.0</version>
</dependency>

Because the Maven artifacts are not in Maven Central, use the following <repository> element:

<repository>
  <id>cdh.repo</id>
  <name>Cloudera Repositories</name>
  <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
  <snapshots>
    <enabled>false</enabled>
  </snapshots>
</repository>

See subdirectories of https://github.com/cloudera/kudu-examples/tree/master/java for example Maven pom.xml files.

Example Impala Commands With Kudu

See Using Impala With Kudu for guidance on installing and using Impala with Kudu, including several impala-shell examples.

Kudu integration with Spark

Kudu integrates with spark through the spark data source api as of version 0.9 Include the kudu-spark using the --jars

spark-shell --jars /kudu-spark-0.9.0.jar

Then import kudu-spark and create a dataframe:

// Import kudu datasource
import org.kududb.spark.kudu._
val kuduDataFrame =  sqlContext.read.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).kudu
// Then query using spark api or register a temporary table and use spark sql
kuduDataFrame.select("id").filter("id">=5).show()
// Register kuduDataFrame as a temporary table for spark-sql
kuduDataFrame.registerTempTable("kudu_table")
// Select from the dataframe
sqlContext.sql("select id from kudu_table where id>=5").show()

// create a new kudu table from a dataframe
val kuduContext = new KuduContext("your.kudu.master.here")
kuduContext.createTable("testcreatetable", df.schema, Seq("key"), new CreateTableOptions().setNumReplicas(1))

// then we can insert data into the kudu table
df.write.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).mode("append").kudu

// to update existing data change the mode to 'overwrite'
df.write.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).mode("overwrite").kudu

// to check for existance of a kudu table
kuduContext.tableExists("your.kudu.table.here")

// to delete a kudu table
kuduContext.deleteTable("your.kudu.table.here")

Integration with MapReduce, YARN, and Other Frameworks

Kudu was designed to integrate with MapReduce, YARN, Spark, and other frameworks in the Hadoop ecosystem. See RowCounter.java and ImportCsv.java for examples which you can model your own integrations on. Stay tuned for more examples using YARN and Spark in the future.