Interface IColIndex

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Interface Description
      static class  IColIndex.ColIndexType  
      static class  IColIndex.SliceResult
      A Class for slice results containing indexes for the slicing of dictionaries, and the resulting column index
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      double avgOfIndex()
      Get the average of this index.
      IColIndex combine​(IColIndex other)
      combine the indexes of this colIndex with another, it is expected that all calls to this contains unique indexes, and no copies of values.
      boolean contains​(int i)
      Analyze if this column group contain the given column id
      boolean contains​(IColIndex a, IColIndex b)
      This contains method is not strict since it only verifies one element is contained from each a and b.
      boolean containsAny​(IColIndex idx)
      Analyze if this column group contain any of the given column Ids.
      boolean containsStrict​(IColIndex a, IColIndex b)
      This contains both a and b ...
      void decompressToDenseFromSparse​(SparseBlock sb, int vr, int off, double[] c)
      Decompress this column index into the dense c array.
      void decompressVec​(int nCol, double[] c, int off, double[] values, int rowIdx)
      Decompress into c using the values provided.
      boolean equals​(Object other)  
      boolean equals​(IColIndex other)  
      long estimateInMemorySize()
      Get the in memory size of this object.
      int findIndex​(int i)
      Find the index of the value given return negative if non existing.
      int get​(int i)
      Get the index at a specific location, Note that many of the underlying implementations does not throw exceptions on indexes that are completely wrong, so all implementations that use this index should always be well behaved.
      long getExactSizeOnDisk()
      Get the exact size on disk to enable preallocation of the disk output buffer sizes
      int[] getReorderingIndex()
      If the columns are not in sorted incrementing order this method can be called to get the sorting index for this set of column indexes.
      int hashCode()  
      static boolean inOrder​(IColIndex a, IColIndex b)
      Indicate if the two given column indexes are in order such that the first set of indexes all are of lower value than the second.
      boolean isContiguous()
      Get if these columns are contiguous, meaning all indexes are integers at increments of 1.
      boolean isSorted()
      Get if the Index is sorted.
      IIterate iterator()
      A Iterator of the indexes see the iterator interface for details.
      static Pair<int[],​int[]> reorderingIndexes​(IColIndex a, IColIndex b)  
      IColIndex shift​(int i)
      Return a new column index where the values are shifted by the specified amount.
      int size()
      Get the size of the index aka, how many columns is contained
      IColIndex.SliceResult slice​(int l, int u)
      Slice the range given.
      IColIndex sort()
      Sort the index and return a new object if there are modifications otherwise return this.
      void write​(DataOutput out)
      Write out the IO representation of this column index
    • Method Detail

      • size

        int size()
        Get the size of the index aka, how many columns is contained
        Returns:
        The size of the array
      • get

        int get​(int i)
        Get the index at a specific location, Note that many of the underlying implementations does not throw exceptions on indexes that are completely wrong, so all implementations that use this index should always be well behaved.
        Parameters:
        i - The index to get
        Returns:
        the column index at the index.
      • shift

        IColIndex shift​(int i)
        Return a new column index where the values are shifted by the specified amount. It is returning a new instance of the index.
        Parameters:
        i - The amount to shift
        Returns:
        the new instance of an index.
      • write

        void write​(DataOutput out)
            throws IOException
        Write out the IO representation of this column index
        Parameters:
        out - The Output to write into
        Throws:
        IOException - IO exceptions in case of for instance not enough disk space
      • getExactSizeOnDisk

        long getExactSizeOnDisk()
        Get the exact size on disk to enable preallocation of the disk output buffer sizes
        Returns:
        The exact disk representation size
      • estimateInMemorySize

        long estimateInMemorySize()
        Get the in memory size of this object.
        Returns:
        The memory size of this object
      • iterator

        IIterate iterator()
        A Iterator of the indexes see the iterator interface for details.
        Returns:
        A iterator for the indexes contained.
      • findIndex

        int findIndex​(int i)
        Find the index of the value given return negative if non existing.
        Parameters:
        i - the value to find inside the allocation
        Returns:
        The index of the value.
      • slice

        IColIndex.SliceResult slice​(int l,
                                    int u)
        Slice the range given. The slice result is an object containing the indexes in the original array to slice out and a new index for the sliced columns offset by l. Example: ArrayIndex(1,3,5).slice(2,6) returns SliceResult(1,3,ArrayIndex(1,3))
        Parameters:
        l - inclusive lower bound
        u - exclusive upper bound
        Returns:
        A slice result
      • equals

        boolean equals​(IColIndex other)
      • contains

        boolean contains​(IColIndex a,
                         IColIndex b)
        This contains method is not strict since it only verifies one element is contained from each a and b.
        Parameters:
        a - one array to contain at least one value from
        b - another array to contain at least one value from
        Returns:
        if the other arrays contain values from this array
      • containsStrict

        boolean containsStrict​(IColIndex a,
                               IColIndex b)
        This contains both a and b ... it is strict because it verifies all cells. Note it returns false if there are more elements in this than the sum of a and b.
        Parameters:
        a - one other array to contain
        b - another array to contain
        Returns:
        if this array contains both a and b
      • combine

        IColIndex combine​(IColIndex other)
        combine the indexes of this colIndex with another, it is expected that all calls to this contains unique indexes, and no copies of values.
        Parameters:
        other - The other array
        Returns:
        The combined array
      • isContiguous

        boolean isContiguous()
        Get if these columns are contiguous, meaning all indexes are integers at increments of 1. ex: 1,2,3,4,5,6 is contiguous 1,3,4 is not.
        Returns:
        If the Columns are contiguous.
      • getReorderingIndex

        int[] getReorderingIndex()
        If the columns are not in sorted incrementing order this method can be called to get the sorting index for this set of column indexes. The returned list should be the mapping of elements for each column to where it should be after sorting.
        Returns:
        A Reordered index.
      • isSorted

        boolean isSorted()
        Get if the Index is sorted.
        Returns:
        If the index is sorted
      • sort

        IColIndex sort()
        Sort the index and return a new object if there are modifications otherwise return this.
        Returns:
        The sorted instance of this column index.
      • contains

        boolean contains​(int i)
        Analyze if this column group contain the given column id
        Parameters:
        i - id to search for
        Returns:
        if it is contained
      • containsAny

        boolean containsAny​(IColIndex idx)
        Analyze if this column group contain any of the given column Ids.
        Parameters:
        idx - A List of indexes
        Returns:
        If it is contained
      • avgOfIndex

        double avgOfIndex()
        Get the average of this index. We use this to sort the priority que when combining equivalent costly groups
        Returns:
        The average of the indexes.
      • decompressToDenseFromSparse

        void decompressToDenseFromSparse​(SparseBlock sb,
                                         int vr,
                                         int off,
                                         double[] c)
        Decompress this column index into the dense c array.
        Parameters:
        sb - A sparse block to extract values out of and insert into c
        vr - The row to extract from the sparse block
        off - The offset that the row starts at in c.
        c - The dense output to decompress into
      • decompressVec

        void decompressVec​(int nCol,
                           double[] c,
                           int off,
                           double[] values,
                           int rowIdx)
        Decompress into c using the values provided. The offset to start into c is off and then row index is similarly the offset of values. nCol specify the number of values to add over.
        Parameters:
        nCol - The number of columns to copy.
        c - The output to add into
        off - The offset to start in c
        values - the values to copy from
        rowIdx - The offset to start in values
      • inOrder

        static boolean inOrder​(IColIndex a,
                               IColIndex b)
        Indicate if the two given column indexes are in order such that the first set of indexes all are of lower value than the second.
        Parameters:
        a - the first column index
        b - the second column index
        Returns:
        If the first all is lower than the second.