Class TensorBlock

  • All Implemented Interfaces:
    Externalizable, Serializable, org.apache.hadoop.io.Writable, CacheBlock<TensorBlock>

    public class TensorBlock
    extends Object
    implements CacheBlock<TensorBlock>, Externalizable
    A TensorBlock is the most top level representation of a tensor. There are two types of data representation which can be used: Basic/Homogeneous and Data/Heterogeneous Basic supports only one ValueType, while Data supports multiple ValueTypes along the column axis. The format determines if the TensorBlock uses a BasicTensorBlock or a DataTensorBlock for storing the data.
    See Also:
    Serialized Form
    • Field Detail

      • DEFAULT_DIMS

        public static final int[] DEFAULT_DIMS
    • Constructor Detail

      • TensorBlock

        public TensorBlock()
        Create a TensorBlock with [0,0] dimension and homogeneous representation (aka. basic).
      • TensorBlock

        public TensorBlock​(int[] dims,
                           boolean basic)
        Create a TensorBlock with the given dimensions and the given data representation (basic/data).
        Parameters:
        dims - dimensions
        basic - if true then basic TensorBlock else a data type of TensorBlock.
      • TensorBlock

        public TensorBlock​(Types.ValueType vt,
                           int[] dims)
        Create a basic TensorBlock with the given ValueType and the given dimensions.
        Parameters:
        vt - value type
        dims - dimensions
      • TensorBlock

        public TensorBlock​(Types.ValueType[] schema,
                           int[] dims)
        Create a data TensorBlock with the given schema and the given dimensions.
        Parameters:
        schema - schema of the columns
        dims - dimensions
      • TensorBlock

        public TensorBlock​(double value)
        Create a [1,1] basic FP64 TensorBlock containing the given value.
        Parameters:
        value - value to put inside
      • TensorBlock

        public TensorBlock​(BasicTensorBlock basicTensor)
        Wrap the given BasicTensorBlock inside a TensorBlock.
        Parameters:
        basicTensor - basic tensor block
      • TensorBlock

        public TensorBlock​(DataTensorBlock dataTensor)
        Wrap the given DataTensorBlock inside a TensorBlock.
        Parameters:
        dataTensor - basic tensor block
      • TensorBlock

        public TensorBlock​(TensorBlock that)
        Copy constructor
        Parameters:
        that - TensorBlock to copy
    • Method Detail

      • reset

        public void reset()
        Reset all cells to 0.
      • reset

        public void reset​(int[] dims)
        Reset data with new dimensions.
        Parameters:
        dims - new dimensions
      • isBasic

        public boolean isBasic()
      • isAllocated

        public boolean isAllocated()
      • allocateBlock

        public TensorBlock allocateBlock()
        If data is not yet allocated, allocate.
        Returns:
        this TensorBlock
      • getValueType

        public Types.ValueType getValueType()
        Get the ValueType if this TensorBlock is homogeneous.
        Returns:
        ValueType if homogeneous, null otherwise
      • getSchema

        public Types.ValueType[] getSchema()
        Get the schema if this TensorBlock is heterogeneous.
        Returns:
        value type if heterogeneous, null otherwise
      • getNumDims

        public int getNumDims()
      • getInMemorySize

        public long getInMemorySize()
        Description copied from interface: CacheBlock
        Get the in-memory size in bytes of the cache block.
        Specified by:
        getInMemorySize in interface CacheBlock<TensorBlock>
        Returns:
        in-memory size in bytes of cache block
      • isShallowSerialize

        public boolean isShallowSerialize()
        Description copied from interface: CacheBlock
        Indicates if the cache block is subject to shallow serialized, which is generally true if in-memory size and serialized size are almost identical allowing to avoid unnecessary deep serialize.
        Specified by:
        isShallowSerialize in interface CacheBlock<TensorBlock>
        Returns:
        true if shallow serialized
      • isShallowSerialize

        public boolean isShallowSerialize​(boolean inclConvert)
        Description copied from interface: CacheBlock
        Indicates if the cache block is subject to shallow serialized, which is generally true if in-memory size and serialized size are almost identical allowing to avoid unnecessary deep serialize.
        Specified by:
        isShallowSerialize in interface CacheBlock<TensorBlock>
        Parameters:
        inclConvert - if true report blocks as shallow serialize that are currently not amenable but can be brought into an amenable form via toShallowSerializeBlock.
        Returns:
        true if shallow serialized
      • toShallowSerializeBlock

        public void toShallowSerializeBlock()
        Description copied from interface: CacheBlock
        Converts a cache block that is not shallow serializable into a form that is shallow serializable. This methods has no affect if the given cache block is not amenable.
        Specified by:
        toShallowSerializeBlock in interface CacheBlock<TensorBlock>
      • slice

        public final TensorBlock slice​(IndexRange ixrange,
                                       TensorBlock ret)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<TensorBlock>
        Parameters:
        ixrange - index range inclusive
        ret - outputBlock
        Returns:
        sub-block of cache block
      • slice

        public final TensorBlock slice​(int rl,
                                       int ru)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<TensorBlock>
        Parameters:
        rl - row lower
        ru - row upper inclusive
        Returns:
        sub-block of cache block
      • slice

        public final TensorBlock slice​(int rl,
                                       int ru,
                                       boolean deep)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<TensorBlock>
        Parameters:
        rl - row lower
        ru - row upper inclusive
        deep - enforce deep-copy
        Returns:
        sub-block of cache block
      • slice

        public final TensorBlock slice​(int rl,
                                       int ru,
                                       int cl,
                                       int cu)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<TensorBlock>
        Parameters:
        rl - row lower
        ru - row upper inclusive
        cl - column lower
        cu - column upper inclusive
        Returns:
        sub-block of cache block
      • slice

        public final TensorBlock slice​(int rl,
                                       int ru,
                                       int cl,
                                       int cu,
                                       TensorBlock ret)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<TensorBlock>
        Parameters:
        rl - row lower
        ru - row upper inclusive
        cl - column lower
        cu - column upper inclusive
        ret - cache block
        Returns:
        sub-block of cache block
      • slice

        public final TensorBlock slice​(int rl,
                                       int ru,
                                       int cl,
                                       int cu,
                                       boolean deep)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<TensorBlock>
        Parameters:
        rl - row lower
        ru - row upper inclusive
        cl - column lower
        cu - column upper inclusive
        deep - enforce deep-copy
        Returns:
        sub-block of cache block
      • slice

        public TensorBlock slice​(int rl,
                                 int ru,
                                 int cl,
                                 int cu,
                                 boolean deep,
                                 TensorBlock block)
        Description copied from interface: CacheBlock
        Slice a sub block out of the current block and write into the given output block. This method returns the passed instance if not null.
        Specified by:
        slice in interface CacheBlock<TensorBlock>
        Parameters:
        rl - row lower
        ru - row upper inclusive
        cl - column lower
        cu - column upper inclusive
        deep - enforce deep-copy
        block - cache block
        Returns:
        sub-block of cache block
      • merge

        public TensorBlock merge​(TensorBlock that,
                                 boolean appendOnly)
        Description copied from interface: CacheBlock
        Merge disjoint: merges all non-zero values of the given input into the current block. Note that this method does NOT check for overlapping entries; it's the callers responsibility of ensuring disjoint blocks. The appendOnly parameter is only relevant for sparse target blocks; if true, we only append values and do not sort sparse rows for each call; this is useful whenever we merge iterators of matrix blocks into one target block.
        Specified by:
        merge in interface CacheBlock<TensorBlock>
        Parameters:
        that - cache block
        appendOnly - Indicate if the merger can be append only on sparse rows.
        Returns:
        the merged group, in most implementations 'this' is modified.
      • getDouble

        public double getDouble​(int r,
                                int c)
        Description copied from interface: CacheBlock
        Returns the double value at the passed row and column. If the value is missing 0 is returned.
        Specified by:
        getDouble in interface CacheBlock<TensorBlock>
        Parameters:
        r - row of the value
        c - column of the value
        Returns:
        double value at the passed row and column
      • getDoubleNaN

        public double getDoubleNaN​(int r,
                                   int c)
        Description copied from interface: CacheBlock
        Returns the double value at the passed row and column. If the value is missing NaN is returned.
        Specified by:
        getDoubleNaN in interface CacheBlock<TensorBlock>
        Parameters:
        r - row of the value
        c - column of the value
        Returns:
        double value at the passed row and column
      • getString

        public String getString​(int r,
                                int c)
        Description copied from interface: CacheBlock
        Returns the string of the value at the passed row and column. If the value is missing or NaN, null is returned.
        Specified by:
        getString in interface CacheBlock<TensorBlock>
        Parameters:
        r - row of the value
        c - column of the value
        Returns:
        string of the value at the passed row and column
      • getDim

        public int getDim​(int i)
      • getDims

        public int[] getDims()
      • getLongDims

        public long[] getLongDims()
      • getNextIndexes

        public static void getNextIndexes​(int[] dims,
                                          int[] ix)
        Calculates the next index array. Note that if the given index array was the last element, the next index will be the first one.
        Parameters:
        dims - the dims array for which we have to decide the next index
        ix - the index array which will be incremented to the next index array
      • getNextIndexes

        public void getNextIndexes​(int[] ix)
        Calculates the next index array. Note that if the given index array was the last element, the next index will be the first one.
        Parameters:
        ix - the index array which will be incremented to the next index array
      • isVector

        public boolean isVector()
      • isMatrix

        public boolean isMatrix()
      • getLength

        public long getLength()
      • isEmpty

        public boolean isEmpty()
      • isEmpty

        public boolean isEmpty​(boolean safe)
      • getNonZeros

        public long getNonZeros()
      • get

        public Object get​(int[] ix)
      • get

        public double get​(int r,
                          int c)
      • set

        public void set​(Object v)
      • set

        public void set​(int[] ix,
                        Object v)
        Set a cell to the value given as an `Object`.
        Parameters:
        ix - indexes in each dimension, starting with 0
        v - value to set
      • set

        public void set​(int r,
                        int c,
                        double v)
        Set a cell in a 2-dimensional tensor.
        Parameters:
        r - row of the cell
        c - column of the cell
        v - value to set
      • slice

        public TensorBlock slice​(int[] offsets,
                                 TensorBlock outBlock)
        Slice the current block and write into the outBlock. The offsets determines where the slice starts, the length of the blocks is given by the outBlock dimensions.
        Parameters:
        offsets - offsets where the slice starts
        outBlock - sliced result block
        Returns:
        the sliced result block
      • copy

        public TensorBlock copy​(int[] lower,
                                int[] upper,
                                TensorBlock src)
        Copy a part of another TensorBlock
        Parameters:
        lower - lower index of elements to copy (inclusive)
        upper - upper index of elements to copy (exclusive)
        src - source TensorBlock
        Returns:
        the shallow copy of the src TensorBlock
      • copyExact

        public TensorBlock copyExact​(int[] lower,
                                     int[] upper,
                                     TensorBlock src)
        Copy a part of another TensorBlock. The difference to copy() is that this allows for exact sub-blocks instead of taking all consecutive data elements from lower to upper.
        Parameters:
        lower - lower index of elements to copy (inclusive)
        upper - upper index of elements to copy (exclusive)
        src - source TensorBlock
        Returns:
        the deep copy of the src TensorBlock
      • getExactSerializedSize

        public long getExactSerializedSize()
        Description copied from interface: CacheBlock
        Get the exact serialized size in bytes of the cache block.
        Specified by:
        getExactSerializedSize in interface CacheBlock<TensorBlock>
        Returns:
        exact serialized size in bytes of cache block
      • getExactBlockDataSerializedSize

        public long getExactBlockDataSerializedSize​(BasicTensorBlock bt)
        Get the exact serialized size of a BasicTensorBlock if written by TensorBlock.writeBlockData(DataOutput,BasicTensorBlock).
        Parameters:
        bt - BasicTensorBlock
        Returns:
        the size of the block data in serialized form
      • writeBlockData

        public void writeBlockData​(DataOutput out,
                                   BasicTensorBlock bt)
                            throws IOException
        Write a BasicTensorBlock.
        Parameters:
        out - output stream
        bt - source BasicTensorBlock
        Throws:
        IOException - if writing with the output stream fails
      • readFields

        public void readFields​(DataInput in)
                        throws IOException
        Specified by:
        readFields in interface org.apache.hadoop.io.Writable
        Throws:
        IOException