Class DDCArray<T>

  • All Implemented Interfaces:
    org.apache.hadoop.io.Writable

    public class DDCArray<T>
    extends ACompressedArray<T>
    A dense dictionary version of an column array
    • Method Detail

      • getDict

        public Array<T> getDict()
      • compressToDDC

        public static <T> Array<T> compressToDDC​(Array<T> arr)
      • compressToDDC

        public static <T> Array<T> compressToDDC​(Array<T> arr,
                                                 int estimateUnique)
        Try to compress array into DDC format.
        Type Parameters:
        T - The type of the Array
        Parameters:
        arr - The array to try to compress
        estimateUnique - The estimated number of unique values
        Returns:
        Either a compressed version or the original.
      • get

        public T get​(int index)
        Description copied from class: Array
        Get the value at a given index. This method returns objects that have a high overhead in allocation. Therefore it is not as efficient as using the vectorized operations specified in the object.
        Specified by:
        get in class Array<T>
        Parameters:
        index - The index to query
        Returns:
        The value returned as an object
      • extractDouble

        public double[] extractDouble​(double[] ret,
                                      int rl,
                                      int ru)
        Description copied from class: Array
        Extract the sub array into the ret array as doubles. The ret array is filled from - rl, meaning that the ret array should be of length ru - rl.
        Overrides:
        extractDouble in class Array<T>
        Parameters:
        ret - The array to return
        rl - The row to start at
        ru - The row to end at (not inclusive.)
        Returns:
        The ret array given as argument
      • getAsDouble

        public double getAsDouble​(int i)
        Description copied from class: Array
        Get the index's value. returns 0 in case of Null.
        Specified by:
        getAsDouble in class Array<T>
        Parameters:
        i - index to get value from
        Returns:
        the value
      • getAsNaNDouble

        public double getAsNaNDouble​(int i)
        Description copied from class: Array
        Get the index's value. returns Double.NaN in case of Null.
        Overrides:
        getAsNaNDouble in class Array<T>
        Parameters:
        i - index to get value from
        Returns:
        the value
      • append

        public Array<T> append​(Array<T> other)
        Description copied from class: Array
        Append other array, if the other array is fitting in current allocated size use that allocated size, otherwise allocate new array to combine the other with this. This method should use the set range function, and should be preferred over the append single values.
        Specified by:
        append in class Array<T>
        Parameters:
        other - The other array of same type to append to this.
        Returns:
        The combined arrays.
      • slice

        public Array<T> slice​(int rl,
                              int ru)
        Description copied from class: Array
        Slice out the sub range and return new array with the specified type. If the conversion fails fallback to normal slice.
        Specified by:
        slice in class Array<T>
        Parameters:
        rl - row start
        ru - row end (not included)
        Returns:
        A new array of sub range.
      • getAsByteArray

        public byte[] getAsByteArray()
        Description copied from class: Array
        Return the current allocated Array as a byte[], this is used to serialize the allocated Arrays out to the PythonAPI.
        Specified by:
        getAsByteArray in class Array<T>
        Returns:
        The array as bytes
      • getValueType

        public Types.ValueType getValueType()
        Description copied from class: Array
        Get the current value type of this array.
        Specified by:
        getValueType in class Array<T>
        Returns:
        The current value type.
      • analyzeValueType

        public Pair<Types.ValueType,​Boolean> analyzeValueType​(int maxCells)
        Description copied from class: Array
        Analyze the column to figure out if the value type can be refined to a better type. The return is in two parts, first the type it can be, second if it contains nulls.
        Specified by:
        analyzeValueType in class Array<T>
        Parameters:
        maxCells - maximum number of cells to analyze
        Returns:
        A better or equivalent value type to represent the column, including null information.
      • set

        public void set​(int rl,
                        int ru,
                        Array<T> value)
        Description copied from class: Array
        Set range to given arrays value
        Specified by:
        set in class Array<T>
        Parameters:
        rl - row lower
        ru - row upper (inclusive)
        value - value array to take values from (same type)
      • getFrameArrayType

        public ArrayFactory.FrameArrayType getFrameArrayType()
        Description copied from class: Array
        Get the internal FrameArrayType, to specify the encoding of the Types, note there are more Frame Array Types than there is ValueTypes.
        Specified by:
        getFrameArrayType in class Array<T>
        Returns:
        The FrameArrayType
      • getExactSerializedSize

        public long getExactSerializedSize()
        Description copied from class: Array
        Get the exact serialized size on disk of this array.
        Specified by:
        getExactSerializedSize in class Array<T>
        Returns:
        The exact size on disk
      • changeType

        public Array<?> changeType​(Types.ValueType t)
        Description copied from class: Array
        Change the allocated array to a different type. If the type is the same a deep copy is returned for safety.
        Specified by:
        changeType in class ACompressedArray<T>
        Parameters:
        t - The type to change to
        Returns:
        A new column array.
      • isShallowSerialize

        public boolean isShallowSerialize()
        Description copied from class: Array
        analyze if this array can be shallow serialized. to allow caching without modification.
        Specified by:
        isShallowSerialize in class Array<T>
        Returns:
        boolean saying true if shallow serialization is available
      • isEmpty

        public boolean isEmpty()
        Description copied from class: Array
        Get if this array is empty, aka filled with empty values.
        Specified by:
        isEmpty in class Array<T>
        Returns:
        boolean saying true if empty
      • select

        public Array<T> select​(int[] indices)
        Description copied from class: Array
        Slice out the specified indices and return the sub array.
        Specified by:
        select in class Array<T>
        Parameters:
        indices - The indices to slice out
        Returns:
        the sliced out indices in an array format
      • select

        public Array<T> select​(boolean[] select,
                               int nTrue)
        Description copied from class: Array
        Slice out the true indices in the select input and return the sub array.
        Specified by:
        select in class Array<T>
        Parameters:
        select - a boolean vector specifying what to select
        nTrue - number of true values inside select
        Returns:
        the sliced out indices in an array format
      • isNotEmpty

        public boolean isNotEmpty​(int i)
        Specified by:
        isNotEmpty in class Array<T>
      • clone

        public Array<T> clone()
        Description copied from class: Array
        Overwrite of the java internal clone function for arrays, return a clone of underlying data that is mutable, (not immutable data.) Immutable data is dependent on the individual allocated arrays
        Specified by:
        clone in class Array<T>
        Returns:
        A clone
      • hashDouble

        public double hashDouble​(int idx)
        Description copied from class: Array
        Hash the given index of the array. It is allowed to return NaN on null elements.
        Specified by:
        hashDouble in class Array<T>
        Parameters:
        idx - The index to hash
        Returns:
        The hash value of that index.
      • getInMemorySize

        public long getInMemorySize()
        Description copied from class: Array
        Get in memory size, not counting reference to this object.
        Overrides:
        getInMemorySize in class Array<T>
        Returns:
        the size in memory of this object.
      • estimateInMemorySize

        public static long estimateInMemorySize​(int memSizeBitPerElement,
                                                int estDistinct,
                                                int nRow)
      • containsNull

        public boolean containsNull()
        Description copied from class: Array
        analyze if the array contains null values.
        Overrides:
        containsNull in class Array<T>
        Returns:
        If the array contains null.
      • equals

        public boolean equals​(Array<T> other)
        Description copied from class: Array
        Equals operation on arrays.
        Specified by:
        equals in class Array<T>
        Parameters:
        other - The other array to compare to.
        Returns:
        True if the arrays are equivalent.
      • minMax

        public double[] minMax()
        Description copied from class: Array
        Get the minimum and maximum double value of this array. Note that we ignore NaN Values.
        Overrides:
        minMax in class Array<T>
        Returns:
        The min and max in index 0 and 1 of the array.
      • minMax

        public double[] minMax​(int l,
                               int u)
        Description copied from class: Array
        Get the minimum and maximum double value of a specific sub part of this array. Note that we ignore NaN Values.
        Overrides:
        minMax in class Array<T>
        Parameters:
        l - The lower index to search from
        u - The upper index to end at (not inclusive)
        Returns:
        The min and max in index 0 and 1 of the array in the range.
      • statistics

        public ArrayCompressionStatistics statistics​(int nSamples)
        Description copied from class: Array
        Get the compression statistics of this array allocation.
        Specified by:
        statistics in class ACompressedArray<T>
        Parameters:
        nSamples - The number of sample elements suggested (not forced) to be used.
        Returns:
        The compression statistics of this array.