Class Array<T>

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      class  Array.ArrayIterator  
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      Pair<Types.ValueType,​Boolean> analyzeValueType()
      Analyze the column to figure out if the value type can be refined to a better type.
      abstract Pair<Types.ValueType,​Boolean> analyzeValueType​(int maxCells)
      Analyze the column to figure out if the value type can be refined to a better type.
      abstract void append​(String value)
      Append a string value to the current Array, this should in general be avoided, and appending larger blocks at a time should be preferred.
      abstract Array<T> append​(Array<T> other)
      Append other array, if the other array is fitting in current allocated size use that allocated size, otherwise allocate new array to combine the other with this.
      abstract void append​(T value)
      Append a value of the same type of the Array.
      static long baseMemoryCost()
      Get the base memory cost of the Arrays allocation.
      Array<?> changeType​(Types.ValueType t)
      Change the allocated array to a different type.
      Array<?> changeType​(Types.ValueType t, boolean containsNull)
      Change type taking into consideration if the target type must be able to contain Null.
      Array<?> changeType​(Array<?> ret)
      Change type by moving this arrays value into the given ret array.
      Array<?> changeType​(Array<?> ret, int rl, int ru)
      Put the changed value types into the given ret array inside the range specified.
      Array<?> changeTypeWithNulls​(Types.ValueType t)  
      Array<?> changeTypeWithNulls​(Array<?> ret)  
      Array<?> changeTypeWithNulls​(Array<?> ret, int l, int u)  
      abstract Array<T> clone()
      Overwrite of the java internal clone function for arrays, return a clone of underlying data that is mutable, (not immutable data.) Immutable data is dependent on the individual allocated arrays
      boolean containsNull()
      analyze if the array contains null values.
      boolean equals​(Object other)  
      abstract boolean equals​(Array<T> other)
      Equals operation on arrays.
      double[] extractDouble​(double[] ret, int rl, int ru)
      Extract the sub array into the ret array as doubles.
      abstract void fill​(String val)
      fill the entire array with specific value.
      abstract void fill​(T val)
      fill the entire array with specific value.
      void findEmpty​(boolean[] select)
      Find the empty rows, it is assumed that the input is to be only modified to set variables to true.
      void findEmptyInverse​(boolean[] select)
      Find the filled rows, it is assumed that the input i to be only modified to set variables to true;
      abstract Object get()
      Get the underlying array out of the column Group, it is the responsibility of the caller to know what type it is.
      abstract T get​(int index)
      Get the value at a given index.
      abstract byte[] getAsByteArray()
      Return the current allocated Array as a byte[], this is used to serialize the allocated Arrays out to the PythonAPI.
      abstract double getAsDouble​(int i)
      Get the index's value.
      double getAsNaNDouble​(int i)
      Get the index's value.
      SoftReference<Map<T,​Long>> getCache()
      Get the current cached recode map.
      abstract long getExactSerializedSize()
      Get the exact serialized size on disk of this array.
      abstract ArrayFactory.FrameArrayType getFrameArrayType()
      Get the internal FrameArrayType, to specify the encoding of the Types, note there are more Frame Array Types than there is ValueTypes.
      long getInMemorySize()
      Get in memory size, not counting reference to this object.
      T getInternal​(int index)
      Get the internal value at a given index.
      Array.ArrayIterator getIterator()  
      Pair<Integer,​Integer> getMinMaxLength()
      Get the minimum and maximum length of the contained values as string type.
      ABooleanArray getNulls()  
      Map<T,​Long> getRecodeMap()
      Get a recode map that maps each unique value in the array, to a long ID.
      abstract Types.ValueType getValueType()
      Get the current value type of this array.
      abstract double hashDouble​(int idx)
      Hash the given index of the array.
      abstract boolean isEmpty()
      Get if this array is empty, aka filled with empty values.
      abstract boolean isNotEmpty​(int i)  
      abstract boolean isShallowSerialize()
      analyze if this array can be shallow serialized.
      double[] minMax()
      Get the minimum and maximum double value of this array.
      double[] minMax​(int l, int u)
      Get the minimum and maximum double value of a specific sub part of this array.
      abstract boolean possiblyContainsNaN()  
      abstract void reset​(int size)
      Reset the Array and set to a different size.
      abstract Array<T> select​(boolean[] select, int nTrue)
      Slice out the true indices in the select input and return the sub array.
      abstract Array<T> select​(int[] indices)
      Slice out the specified indices and return the sub array.
      abstract void set​(int index, double value)
      Set index to given double value (cast to the correct type of this array)
      abstract void set​(int rl, int ru, Array<T> value)
      Set range to given arrays value
      void set​(int rl, int ru, Array<T> value, int rlSrc)
      Set range to given arrays value with an offset into other array
      abstract void set​(int index, String value)
      Set index to the given value of the string parsed.
      abstract void set​(int index, T value)
      Set index to the given value of same type
      void setCache​(SoftReference<Map<T,​Long>> m)
      Set the cached hashmap cache of this Array allocation, to be used in transformEncode.
      abstract void setFromOtherType​(int rl, int ru, Array<?> value)
      Set range to given arrays value
      abstract void setFromOtherTypeNz​(int rl, int ru, Array<?> value)
      Set non default values in the range from the value array given
      void setFromOtherTypeNz​(Array<?> value)
      Set non default values from the value array given
      abstract void setNz​(int rl, int ru, Array<T> value)
      Set non default values in the range from the value array given
      void setNz​(Array<T> value)
      Set non default values from the value array given
      int size()
      Get the number of elements in the array, this does not necessarily reflect the current allocated size.
      abstract Array<T> slice​(int rl, int ru)
      Slice out the sub range and return new array with the specified type.
      ArrayCompressionStatistics statistics​(int nSamples)
      Get the compression statistics of this array allocation.
      String toString()  
      • Methods inherited from interface org.apache.hadoop.io.Writable

        readFields, write
    • Method Detail

      • getCache

        public final SoftReference<Map<T,​Long>> getCache()
        Get the current cached recode map.
        Returns:
        The cached recode map
      • setCache

        public final void setCache​(SoftReference<Map<T,​Long>> m)
        Set the cached hashmap cache of this Array allocation, to be used in transformEncode.
        Parameters:
        m - The element to cache.
      • getRecodeMap

        public final Map<T,​Long> getRecodeMap()
        Get a recode map that maps each unique value in the array, to a long ID. Null values are ignored, and not included in the mapping. The resulting recode map in stored in a soft reference to speed up repeated calls to the same column.
        Returns:
        A recode map
      • size

        public final int size()
        Get the number of elements in the array, this does not necessarily reflect the current allocated size.
        Returns:
        the current number of elements
      • get

        public abstract T get​(int index)
        Get the value at a given index. This method returns objects that have a high overhead in allocation. Therefore it is not as efficient as using the vectorized operations specified in the object.
        Parameters:
        index - The index to query
        Returns:
        The value returned as an object
      • getInternal

        public T getInternal​(int index)
        Get the internal value at a given index. For instance HashIntegerArray would return the underlying long not a string.
        Parameters:
        index - the index to get
        Returns:
        The value to get
      • get

        public abstract Object get()
        Get the underlying array out of the column Group, it is the responsibility of the caller to know what type it is. Also it is not guaranteed that the underlying data structure does not allocate an appropriate response to the caller. This in practice means that if called there is a possibility that the entire array is allocated again. So the method should only be used for debugging purposes not for performance.
        Returns:
        The underlying array.
      • getAsDouble

        public abstract double getAsDouble​(int i)
        Get the index's value. returns 0 in case of Null.
        Parameters:
        i - index to get value from
        Returns:
        the value
      • getAsNaNDouble

        public double getAsNaNDouble​(int i)
        Get the index's value. returns Double.NaN in case of Null.
        Parameters:
        i - index to get value from
        Returns:
        the value
      • set

        public abstract void set​(int index,
                                 T value)
        Set index to the given value of same type
        Parameters:
        index - The index to set
        value - The value to assign
      • set

        public abstract void set​(int index,
                                 double value)
        Set index to given double value (cast to the correct type of this array)
        Parameters:
        index - the index to set
        value - the value to set it to (before casting to correct value type)
      • set

        public abstract void set​(int index,
                                 String value)
        Set index to the given value of the string parsed.
        Parameters:
        index - The index to set
        value - The value to assign
      • setFromOtherType

        public abstract void setFromOtherType​(int rl,
                                              int ru,
                                              Array<?> value)
        Set range to given arrays value
        Parameters:
        rl - row lower
        ru - row upper (inclusive)
        value - value array to take values from (other type)
      • set

        public abstract void set​(int rl,
                                 int ru,
                                 Array<T> value)
        Set range to given arrays value
        Parameters:
        rl - row lower
        ru - row upper (inclusive)
        value - value array to take values from (same type)
      • set

        public void set​(int rl,
                        int ru,
                        Array<T> value,
                        int rlSrc)
        Set range to given arrays value with an offset into other array
        Parameters:
        rl - row lower
        ru - row upper (inclusive)
        value - value array to take values from
        rlSrc - the offset into the value array to take values from
      • setNz

        public final void setNz​(Array<T> value)
        Set non default values from the value array given
        Parameters:
        value - array of same type and length
      • setNz

        public abstract void setNz​(int rl,
                                   int ru,
                                   Array<T> value)
        Set non default values in the range from the value array given
        Parameters:
        rl - row start
        ru - row upper inclusive
        value - value array of same type
      • setFromOtherTypeNz

        public final void setFromOtherTypeNz​(Array<?> value)
        Set non default values from the value array given
        Parameters:
        value - array of other type
      • setFromOtherTypeNz

        public abstract void setFromOtherTypeNz​(int rl,
                                                int ru,
                                                Array<?> value)
        Set non default values in the range from the value array given
        Parameters:
        rl - row start
        ru - row end inclusive
        value - value array of different type
      • append

        public abstract void append​(String value)
        Append a string value to the current Array, this should in general be avoided, and appending larger blocks at a time should be preferred.
        Parameters:
        value - The value to append
      • append

        public abstract void append​(T value)
        Append a value of the same type of the Array. This should in general be avoided, and appending larger blocks at a time should be preferred.
        Parameters:
        value - The value to append
      • append

        public abstract Array<T> append​(Array<T> other)
        Append other array, if the other array is fitting in current allocated size use that allocated size, otherwise allocate new array to combine the other with this. This method should use the set range function, and should be preferred over the append single values.
        Parameters:
        other - The other array of same type to append to this.
        Returns:
        The combined arrays.
      • slice

        public abstract Array<T> slice​(int rl,
                                       int ru)
        Slice out the sub range and return new array with the specified type. If the conversion fails fallback to normal slice.
        Parameters:
        rl - row start
        ru - row end (not included)
        Returns:
        A new array of sub range.
      • reset

        public abstract void reset​(int size)
        Reset the Array and set to a different size. This method is used to reuse an already allocated Array, without extra allocation. It should only be done in cases where the Array is no longer in use in any FrameBlocks.
        Parameters:
        size - The size to reallocate into.
      • getAsByteArray

        public abstract byte[] getAsByteArray()
        Return the current allocated Array as a byte[], this is used to serialize the allocated Arrays out to the PythonAPI.
        Returns:
        The array as bytes
      • getValueType

        public abstract Types.ValueType getValueType()
        Get the current value type of this array.
        Returns:
        The current value type.
      • analyzeValueType

        public final Pair<Types.ValueType,​Boolean> analyzeValueType()
        Analyze the column to figure out if the value type can be refined to a better type. The return is in two parts, first the type it can be, second if it contains nulls.
        Returns:
        A better or equivalent value type to represent the column, including null information.
      • analyzeValueType

        public abstract Pair<Types.ValueType,​Boolean> analyzeValueType​(int maxCells)
        Analyze the column to figure out if the value type can be refined to a better type. The return is in two parts, first the type it can be, second if it contains nulls.
        Parameters:
        maxCells - maximum number of cells to analyze
        Returns:
        A better or equivalent value type to represent the column, including null information.
      • getFrameArrayType

        public abstract ArrayFactory.FrameArrayType getFrameArrayType()
        Get the internal FrameArrayType, to specify the encoding of the Types, note there are more Frame Array Types than there is ValueTypes.
        Returns:
        The FrameArrayType
      • getInMemorySize

        public long getInMemorySize()
        Get in memory size, not counting reference to this object.
        Returns:
        the size in memory of this object.
      • baseMemoryCost

        public static long baseMemoryCost()
        Get the base memory cost of the Arrays allocation.
        Returns:
        The base memory cost
      • getExactSerializedSize

        public abstract long getExactSerializedSize()
        Get the exact serialized size on disk of this array.
        Returns:
        The exact size on disk
      • containsNull

        public boolean containsNull()
        analyze if the array contains null values.
        Returns:
        If the array contains null.
      • possiblyContainsNaN

        public abstract boolean possiblyContainsNaN()
      • changeType

        public Array<?> changeType​(Types.ValueType t,
                                   boolean containsNull)
        Change type taking into consideration if the target type must be able to contain Null.
        Parameters:
        t - The target type
        containsNull - If the target should be able to contain null
        Returns:
        The changed type array.
      • changeTypeWithNulls

        public final Array<?> changeTypeWithNulls​(Array<?> ret)
      • changeTypeWithNulls

        public final Array<?> changeTypeWithNulls​(Array<?> ret,
                                                  int l,
                                                  int u)
      • changeType

        public Array<?> changeType​(Types.ValueType t)
        Change the allocated array to a different type. If the type is the same a deep copy is returned for safety.
        Parameters:
        t - The type to change to
        Returns:
        A new column array.
      • changeType

        public final Array<?> changeType​(Array<?> ret)
        Change type by moving this arrays value into the given ret array.
        Parameters:
        ret - The Array to put this arrays values into
        Returns:
        The ret array given
      • changeType

        public final Array<?> changeType​(Array<?> ret,
                                         int rl,
                                         int ru)
        Put the changed value types into the given ret array inside the range specified.
        Parameters:
        ret - The Array to put this arrays values into
        rl - inclusive lower bound
        ru - exclusive upper bound
        Returns:
        The ret array given.
      • getMinMaxLength

        public Pair<Integer,​Integer> getMinMaxLength()
        Get the minimum and maximum length of the contained values as string type.
        Returns:
        A Pair of first the minimum length, second the maximum length
      • fill

        public abstract void fill​(String val)
        fill the entire array with specific value.
        Parameters:
        val - the value to fill with.
      • fill

        public abstract void fill​(T val)
        fill the entire array with specific value.
        Parameters:
        val - the value to fill with.
      • isShallowSerialize

        public abstract boolean isShallowSerialize()
        analyze if this array can be shallow serialized. to allow caching without modification.
        Returns:
        boolean saying true if shallow serialization is available
      • isEmpty

        public abstract boolean isEmpty()
        Get if this array is empty, aka filled with empty values.
        Returns:
        boolean saying true if empty
      • select

        public abstract Array<T> select​(int[] indices)
        Slice out the specified indices and return the sub array.
        Parameters:
        indices - The indices to slice out
        Returns:
        the sliced out indices in an array format
      • select

        public abstract Array<T> select​(boolean[] select,
                                        int nTrue)
        Slice out the true indices in the select input and return the sub array.
        Parameters:
        select - a boolean vector specifying what to select
        nTrue - number of true values inside select
        Returns:
        the sliced out indices in an array format
      • findEmpty

        public final void findEmpty​(boolean[] select)
        Find the empty rows, it is assumed that the input is to be only modified to set variables to true.
        Parameters:
        select - Modify this to true in indexes that are not empty.
      • isNotEmpty

        public abstract boolean isNotEmpty​(int i)
      • findEmptyInverse

        public void findEmptyInverse​(boolean[] select)
        Find the filled rows, it is assumed that the input i to be only modified to set variables to true;
        Parameters:
        select - modify this to true in indexes that are empty.
      • clone

        public abstract Array<T> clone()
        Overwrite of the java internal clone function for arrays, return a clone of underlying data that is mutable, (not immutable data.) Immutable data is dependent on the individual allocated arrays
        Returns:
        A clone
      • hashDouble

        public abstract double hashDouble​(int idx)
        Hash the given index of the array. It is allowed to return NaN on null elements.
        Parameters:
        idx - The index to hash
        Returns:
        The hash value of that index.
      • extractDouble

        public double[] extractDouble​(double[] ret,
                                      int rl,
                                      int ru)
        Extract the sub array into the ret array as doubles. The ret array is filled from - rl, meaning that the ret array should be of length ru - rl.
        Parameters:
        ret - The array to return
        rl - The row to start at
        ru - The row to end at (not inclusive.)
        Returns:
        The ret array given as argument
      • equals

        public abstract boolean equals​(Array<T> other)
        Equals operation on arrays.
        Parameters:
        other - The other array to compare to.
        Returns:
        True if the arrays are equivalent.
      • statistics

        public ArrayCompressionStatistics statistics​(int nSamples)
        Get the compression statistics of this array allocation.
        Parameters:
        nSamples - The number of sample elements suggested (not forced) to be used.
        Returns:
        The compression statistics of this array.
      • minMax

        public double[] minMax()
        Get the minimum and maximum double value of this array. Note that we ignore NaN Values.
        Returns:
        The min and max in index 0 and 1 of the array.
      • minMax

        public double[] minMax​(int l,
                               int u)
        Get the minimum and maximum double value of a specific sub part of this array. Note that we ignore NaN Values.
        Parameters:
        l - The lower index to search from
        u - The upper index to end at (not inclusive)
        Returns:
        The min and max in index 0 and 1 of the array in the range.