git clone https://github.com/cloudera/kudu
The Kudu team uses Gerrit for code review, rather than Github pull requests. Typically, you pull from Github but push to Gerrit, and Gerrit is used to review code and merge it into Github.
See the Gerrit Tutorial for an overview of using Gerrit for code review.
Sign in to Gerrit using your Github username.
Go to Settings. Update your name and email address on the Contact Information page, and upload a SSH public key. If you do not update your name, it will show up as "Anonymous Coward" in Gerrit reviews.
If you have not done so, clone the main Kudu repository. By default, the main remote
is called origin
. When you fetch or pull, you will do so from origin
.
git clone https://github.com/cloudera/kudu
Change to the new kudu
directory.
Add a gerrit
remote. In the following command, substitute <username> with your
Github username.
git remote add gerrit ssh://<username>@gerrit.cloudera.org:29418/kudu
Run the following command to install the
Gerrit commit-msg
hook. Use the following command, replacing <username>
with your
Github username.
gitdir=$(git rev-parse --git-dir); scp -p -P 29418 <username>@gerrit.cloudera.org:hooks/commit-msg ${gitdir}/hooks/
Be sure you have set the Kudu repository to use pull --rebase
by default. You
can use the following two commands, assuming you have only ever checked out master
so far:
git config branch.autosetuprebase always git config branch.master.rebase true
If for some reason you had already checked out branches other than master
, substitute
master
for the other branch names in the second command above.
To submit a patch, first commit your change (using a descriptive multi-line
commit message if possible), then push the request to the gerrit
remote. For instance, to push a change
to the master
branch:
git push gerrit HEAD:refs/for/master --no-thin
or to push a change to the gh-pages
branch (to update the website):
git push gerrit HEAD:refs/for/gh-pages --no-thin
The --no-thin argument is a workaround to prevent an error in Gerrit. See
https://code.google.com/p/gerrit/issues/detail?id=1582.
|
Consider creating Git aliases for the above commands. Gerrit also includes a command-line tool called git-review, which you may find helpful. |
Gerrit will add a change ID to your commit message and will create a Gerrit review,
whose URL will be emitted as part of the push reply. If desired, you can send a message
to the kudu-dev
mailing list, explaining your patch and requesting review.
After getting feedback, you can update or amend your commit, (for instance, using
a command like git commit --amend
) while leaving the Change
ID intact. Push your change to Gerrit again, and this will create a new patch set
in Gerrit and notify all reviewers about the change.
When your code has been reviewed and is ready to be merged into the Kudu code base, a Kudu committer will merge it using Gerrit. You can discard your local branch.
If your patch is not accepted or you decide to pull it from consideration, you can use the Gerrit UI to Abandon the patch. It will still show in Gerrit’s history, but will not be listed as a pending review.
You can view a unified or side-by-side diff of changes in Gerrit using the web UI. To leave a comment, click the relevant line number or highlight the relevant part of the line, and type 'c' to bring up a comment box. To submit your comments and/or your review status, go up to the top level of the review and click Reply. You can add additional top-level comments here, and submit them.
To check out code from a Gerrit review, click Download and paste the relevant Git commands into your Git client. You can then update the commit and push to Gerrit to submit a patch to the review, even if you were not the original reviewer.
Gerrit allows you to vote on a review. A vote of +2
from at least one committer
(besides the submitter) is required before the patch can be merged.
Get familiar with these guidelines so that your contributions can be reviewed and integrated quickly and easily.
In general, Kudu follows the Google C++ Style Guide, with the following exceptions:
boost
Library Useboost
libraries can be used in cases where a suitable
replacement does not exist in the Kudu code base. However, try to avoid introducing
dependencies on new boost
libraries, and use Kudu code in preference
to boost
where available. For example, do not use `boost’s scoped pointer
implementations.
boost
Librariesboost::assign
(container literals)
boost::shared_mutex
(but prefer Kudu’s spin lock implementation for short
critical sections)
Check that any features from boost
you use are present in boost
1.46
or earlier, for compatibility with RHEL 6.
boost
Libraries and the Kudu C++ ClientDo not use boost
in any public headers for the Kudu C++ client, because
boost
commonly breaks backward compatibility, and passing data between two boost
versions (one by the user, one by Kudu) causes serious issues.
In addition, do not create dependencies from the Kudu C++ client to any boost
libraries. libboost_system
is particularly troublesome, as any boost
code
that throws exceptions will grow a dependency on it. Among other things, you
cannot use boost::{lock_guard,unique_lock,shared_lock}
in any code consumed
by the C++ client (such as common/ and util/).
The Kudu team allows line lengths of 100 characters per line, rather than Google’s standard of 80. Try to keep under 80 where possible, but you can spill over to 100 or so if necessary.
Generally, most objects should have clear "single-owner" semantics.
Most of the time, singly-owned objects can be wrapped in a gscoped_ptr<>
which ensures deletion on scope exit and prevents accidental copying.
gscoped_ptr
is similar to C++11’s unique_ptr
in that it has a release
method and also provides emulated move
semantics (see gscoped_ptr.h for
example usage).
If an object is singly owned, but referenced from multiple places, such as when the pointed-to object is known to be valid at least as long as the pointer itself, associate a comment with the constructor which takes and stores the raw pointer, as in the following example.
// 'blah' must remain valid for the lifetime of this class
MyClass(const Blah* blah) :
blah_(blah) {
}
If you use raw pointers within STL collections or inside of vectors and other containers,
associate a comment with the container, which explains the ownership
semantics (owned or un-owned). Use utility code from gutil/stl_util.h, such as
STLDeleteElements
or ElementDeleter
, to ease handling of deletion of the
contained elements.
Using std::auto_ptr is strictly disallowed because of its difficult and
bug-prone semantics.
|
Although single ownership is ideal, sometimes it is not possible, particularly
when multiple threads are in play and the lifetimes of the pointers are not
clearly defined. In these cases, you can use either std::tr1::shared_ptr
or
Kudu’s own scoped_refptr
from gutil/ref_counted.hpp. Each of these mechanisms
relies on reference counting to automatically delete the referent once no more
pointers remain. The key difference between these two types of pointers is that
scoped_refptr
requires that the object extend a RefCounted
base class, and
stores its reference count inside the object storage itself, while shared_ptr
maintains a separate reference count on the heap.
The pros and cons are:
shared_ptr
can be used with any type of object, without the object deriving from a special base class
part of the standard library and familiar to most C++ developers
creating a new object requires two allocations instead of one (one to create the ref count, and one to create the object)
the ref count may not be near the object on the heap, so extra cache misses may be incurred on access
the shared_ptr
instance itself requires 16 bytes
(pointer to the ref count and pointer to the object)
if you convert from the shared_ptr
to a raw pointer,
you can’t get back the shared_ptr
scoped_refptr
only requires a single allocation, and ref count is on the same cache line as the object
the pointer only requires 8 bytes (since the ref count is within the object)
you can manually increase or decrease reference counts when more control is required
you can convert from a raw pointer back
to a scoped_refptr
safely without worrying about double freeing
since we control the implementation, we can implement features, such as debug builds that capture the stack trace of every referent to help debug leaks.
the referred-to object must inherit
from RefCounted
does not support weak_ptr<>
use cases
Since scoped_refptr
is generally faster and smaller, try to use it
rather than shared_ptr
in new code. Existing code uses shared_ptr
in many places. When interfacing with that code, you can continue to use shared_ptr
.
Existing code uses boost::bind
and boost::function
for function binding and
callbacks. For new code, use the Callback
and Bind
classes in gutil
instead.
While less full-featured (Bind
doesn’t support argument
place holders, wrapped function pointers, or function objects), they provide
more options by the way of argument lifecycle management. For example, a
bound argument whose class extends RefCounted
will be incremented during Bind
and decremented when the Callback
goes out of scope.
See the large file comment in gutil/callback.h for more details, and util/callback_bind-test.cc for examples.
CMake
Style GuideCMake
allows commands in lower, upper, or mixed case. To keep
the CMake files consistent, please use the following guidelines:
built-in commands in lowercase
add_subdirectory(some/path)
built-in arguments in uppercase
message(STATUS "message goes here")
custom commands or macros in uppercase
ADD_KUDU_TEST(some-test)
Kudu uses gflags for both command-line and file-based configuration. Use these guidelines to add a new gflag. All new gflags must conform to these guidelines. Existing non-conformant ones will be made conformant in time.
The gflag’s name conveys a lot of information, so choose a good name. The name
will propagate into other systems, such as the Configuration
Reference.
- The different parts of a multi-word name should be separated by underscores.
For example, fs_data_dirs
.
- The name should be prefixed with the context that it affects. For example,
webserver_num_worker_threads
and cfile_default_block_size
. Context can be
difficult to define, so bear in mind that this prefix will be
used to group similar gflags together. If the gflag affects the entire
process, it should not be prefixed.
- If the gflag is for a quantity, the name should be suffixed with the units.
For example, remote_bootstrap_idle_timeout_ms
.
- Where possible, use short names. This will save time for those entering
command line options by hand.
- The name is part of Kudu’s compatibility contract, and should not change
without very good reason.
Choosing a default value is generally simple, but like the name, it propagates into other systems. - The default value is part of Kudu’s compatibility contract, and should not change without very good reason.
The gflag’s description should supplement the name and provide additional
context and information. Like the name, the description propagates into other
systems.
- The description may include multiple sentences. Each should begin with a
capital letter, end with a period, and begin one space after the previous.
- The description should NOT include the gflag’s type or default value; they are
provided out-of-band.
- The description should be in the third person. Do not use words like you
.
- A gflag description can be changed freely; it is not expected to remain the
same across Kudu releases.
Kudu’s gflag tagging mechanism adds machine-readable context to each gflag, for use in consuming systems such as documentation or management tools. See the large block comment in flag_tags.h for guidelines.
Avoid creating multiple gflags for the same logical parameter. For
example, many Kudu binaries need to configure a WAL directory. Rather than
creating foo_wal_dir
and bar_wal_dir
gflags, better to have a single
kudu_wal_dir
gflag for use universally.
Add new tests either in existing files, or create new test files as necessary.
It’s OK to fix a bug without adding a new test if it’s triggered by an existing test case. For example, if a race shows up when running a multi-threaded system test after 20 minutes or so, it’s worth trying to make a more targeted test case to trigger the bug. But if that’s hard to do, the existing system test should be enough.
If you want to write a time-intensive
test, make the runtime dependent on KuduTest#AllowSlowTests
, which is
enabled via the KUDU_ALLOW_SLOW_TESTS
environment variable and is
used by Jenkins test execution.
gflags
command-line argument for the number of iterations.This is handy for writing quick stress tests or performance tests.
perf-stat(1)
output.This will show performance improvement or non-regression. Performance-sensitive code should include some test case which can be used as a targeted benchmark.
See Documentation Style Guide for guidelines about contributing to the official Kudu documentation.