Class MurmurHash3

  • All Implemented Interfaces:
    Serializable

    public final class MurmurHash3
    extends Object
    implements Serializable
    The MurmurHash3 is a fast, non-cryptographic, 128-bit hash function that has excellent avalanche and 2-way bit independence properties.

    Austin Appleby's C++ MurmurHash3_x64_128(...), final revision 150, which is in the Public Domain, was the inspiration for this implementation in Java.

    This java implementation pays close attention to the C++ algorithms in order to maintain bit-wise compatibility, but the design is quite different. This implementation has also been extended to include processing of arrays of longs, char or ints, which was not part of the original C++ implementation. This implementation produces the same exact output hash bits as the above C++ method given the same input.

    In addition, with this implementation, the hash of byte[], char[], int[], or long[] will produce the same hash result if, and only if, all the arrays have the same exact length in bytes, and if the contents of the values in the arrays have the same byte endianness and overall order. There is a unit test for this class that demonstrates this.

    The structure of this implementation also reflects a separation of code that is dependent on the input structure (in this case byte[], int[] or long[]) from code that is independent of the input structure. This also makes the code more readable and suitable for future extensions.

    Note that even though this hash function produces 128 bits, the entropy of the resulting hash cannot be greater than the entropy of the input. For example, if the input is only a single long of 64 bits, the entropy of the resulting 128 bit hash is no greater than 64 bits.

    Author:
    Lee Rhodes
    See Also:
    Serialized Form
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static long[] hash​(byte[] key, int offsetBytes, int lengthBytes, long seed)
      Hash a portion of the given byte[] array.
      static long[] hash​(byte[] key, long seed)
      Hash the given byte[] array.
      static long[] hash​(char[] key, int offsetChars, int lengthChars, long seed)
      Hash a portion of the given char[] array.
      static long[] hash​(char[] key, long seed)
      Hash the given char[] array.
      static long[] hash​(int[] key, int offsetInts, int lengthInts, long seed)
      Hash a portion of the given int[] array.
      static long[] hash​(int[] key, long seed)
      Hash the given int[] array.
      static long[] hash​(long[] key, int offsetLongs, int lengthLongs, long seed)
      Hash a portion of the given long[] array.
      static long[] hash​(long[] key, long seed)
      Hash the given long[] array.
      static long[] hash​(long key, long seed)
      Hash the given long.
      static long[] hash​(ByteBuffer buf, long seed)
      Hash the remaining bytes of the given ByteBuffer starting at position().
      static long[] hash​(org.apache.datasketches.memory.Memory mem, long seed)
      Hash the given Memory.
    • Method Detail

      • hash

        public static long[] hash​(long key,
                                  long seed)
        Hash the given long.
        Parameters:
        key - The input long.
        seed - A long valued seed.
        Returns:
        a 128-bit hash of the input as a long array of size 2.
      • hash

        public static long[] hash​(long[] key,
                                  long seed)
        Hash the given long[] array.
        Parameters:
        key - The input long[] array. It must be non-null and non-empty.
        seed - A long valued seed.
        Returns:
        a 128-bit hash of the input as a long array of size 2.
      • hash

        public static long[] hash​(long[] key,
                                  int offsetLongs,
                                  int lengthLongs,
                                  long seed)
        Hash a portion of the given long[] array.
        Parameters:
        key - The input long[] array. It must be non-null and non-empty.
        offsetLongs - the starting offset in longs.
        lengthLongs - the length in longs of the portion of the array to be hashed.
        seed - A long valued seed.
        Returns:
        a 128-bit hash of the input as a long array of size 2
      • hash

        public static long[] hash​(int[] key,
                                  long seed)
        Hash the given int[] array.
        Parameters:
        key - The input int[] array. It must be non-null and non-empty.
        seed - A long valued seed.
        Returns:
        a 128-bit hash of the input as a long array of size 2.
      • hash

        public static long[] hash​(int[] key,
                                  int offsetInts,
                                  int lengthInts,
                                  long seed)
        Hash a portion of the given int[] array.
        Parameters:
        key - The input int[] array. It must be non-null and non-empty.
        offsetInts - the starting offset in ints.
        lengthInts - the length in ints of the portion of the array to be hashed.
        seed - A long valued seed.
        Returns:
        a 128-bit hash of the input as a long array of size 2.
      • hash

        public static long[] hash​(char[] key,
                                  long seed)
        Hash the given char[] array.
        Parameters:
        key - The input char[] array. It must be non-null and non-empty.
        seed - A long valued seed.
        Returns:
        a 128-bit hash of the input as a long array of size 2
      • hash

        public static long[] hash​(char[] key,
                                  int offsetChars,
                                  int lengthChars,
                                  long seed)
        Hash a portion of the given char[] array.
        Parameters:
        key - The input char[] array. It must be non-null and non-empty.
        offsetChars - the starting offset in chars.
        lengthChars - the length in chars of the portion of the array to be hashed.
        seed - A long valued seed.
        Returns:
        a 128-bit hash of the input as a long array of size 2
      • hash

        public static long[] hash​(byte[] key,
                                  long seed)
        Hash the given byte[] array.
        Parameters:
        key - The input byte[] array. It must be non-null and non-empty.
        seed - A long valued seed.
        Returns:
        a 128-bit hash of the input as a long array of size 2.
      • hash

        public static long[] hash​(byte[] key,
                                  int offsetBytes,
                                  int lengthBytes,
                                  long seed)
        Hash a portion of the given byte[] array.
        Parameters:
        key - The input byte[] array. It must be non-null and non-empty.
        offsetBytes - the starting offset in bytes.
        lengthBytes - the length in bytes of the portion of the array to be hashed.
        seed - A long valued seed.
        Returns:
        a 128-bit hash of the input as a long array of size 2.
      • hash

        public static long[] hash​(ByteBuffer buf,
                                  long seed)
        Hash the remaining bytes of the given ByteBuffer starting at position().
        Parameters:
        buf - The input ByteBuffer. It must be non-null and non-empty.
        seed - A long valued seed.
        Returns:
        a 128-bit hash of the input as a long array of size 2.
      • hash

        public static long[] hash​(org.apache.datasketches.memory.Memory mem,
                                  long seed)
        Hash the given Memory.

        Note: if you want to hash only a portion of Memory, convert it to the appropriate Region first with ByteOrder = Little Endian. If it is not Little Endian a new region view will be created as Little Endian. This does not change the underlying data.

        Parameters:
        mem - The input Memory. It must be non-null and non-empty.
        seed - A long valued seed.
        Returns:
        a 128-bit hash of the input as a long array of size 2.