Class ColumnEncoder
- java.lang.Object
-
- org.apache.sysds.runtime.transform.encode.ColumnEncoder
-
- All Implemented Interfaces:
Externalizable
,Serializable
,Comparable<ColumnEncoder>
,Encoder
- Direct Known Subclasses:
ColumnEncoderBin
,ColumnEncoderComposite
,ColumnEncoderDummycode
,ColumnEncoderFeatureHash
,ColumnEncoderPassThrough
,ColumnEncoderRecode
,ColumnEncoderUDF
,ColumnEncoderWordEmbedding
public abstract class ColumnEncoder extends Object implements Encoder, Comparable<ColumnEncoder>
Base class for all transform encoders providing both a row and block interface for decoding frames to matrices.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ColumnEncoder.EncoderType
-
Field Summary
Fields Modifier and Type Field Description static int
APPLY_ROW_BLOCKS_PER_COLUMN
static int
BUILD_ROW_BLOCKS_PER_COLUMN
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description MatrixBlock
apply(CacheBlock<?> in, MatrixBlock out, int outputCol)
Apply Functions are only used in Single Threaded or Multi-Threaded Dense context.MatrixBlock
apply(CacheBlock<?> in, MatrixBlock out, int outputCol, int rowStart, int blk)
void
build(CacheBlock<?> in, double[] equiHeightMaxs)
void
build(CacheBlock<?> in, Map<Integer,double[]> equiHeightMaxs)
void
buildPartial(FrameBlock in)
Partial build of internal data structures (e.g., in distributed spark operations).int
compareTo(ColumnEncoder o)
List<DependencyTask<?>>
getApplyTasks(CacheBlock<?> in, MatrixBlock out, int outputCol)
Callable<Object>
getBuildTask(CacheBlock<?> in)
List<DependencyTask<?>>
getBuildTasks(CacheBlock<?> in)
int
getColID()
MatrixBlock
getColMapping(FrameBlock meta)
Obtain the column mapping of encoded frames based on the passed meta data frame.int
getDomainSize()
long
getEstMetaSize()
int
getEstNumDistincts()
Callable<Object>
getPartialBuildTask(CacheBlock<?> in, int startRow, int blockSize, HashMap<Integer,Object> ret)
Callable<Object>
getPartialMergeBuildTask(HashMap<Integer,?> ret)
Set<Integer>
getSparseRowsWZeros()
void
initEmbeddings(MatrixBlock embeddings)
boolean
isApplicable()
Indicates if this encoder is applicable, i.e, if there is a column to encode.boolean
isApplicable(int colID)
Indicates if this encoder is applicable for the given column ID, i.e., if it is subject to this transformation.void
mergeAt(ColumnEncoder other)
Merges another encoder, of a compatible type, in after a certain position.void
prepareBuildPartial()
Allocates internal data structures for partial build.void
readExternal(ObjectInput in)
Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd deserialization.void
setColID(int colID)
void
setEstMetaSize(long estSize)
void
setEstNumDistincts(int numDistincts)
void
shiftCol(int columnOffset)
void
updateIndexRanges(long[] beginDims, long[] endDims, int colOffset)
Update index-ranges to after encoding.void
writeExternal(ObjectOutput os)
Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd serialization.-
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.sysds.runtime.transform.encode.Encoder
allocateMetaData, build, getMetaData, initMetaData
-
-
-
-
Method Detail
-
initEmbeddings
public void initEmbeddings(MatrixBlock embeddings)
-
apply
public MatrixBlock apply(CacheBlock<?> in, MatrixBlock out, int outputCol)
Apply Functions are only used in Single Threaded or Multi-Threaded Dense context. That's why there is no regard for MT sparse!
-
apply
public MatrixBlock apply(CacheBlock<?> in, MatrixBlock out, int outputCol, int rowStart, int blk)
-
isApplicable
public boolean isApplicable()
Indicates if this encoder is applicable, i.e, if there is a column to encode.- Returns:
- true if a colID is set
-
isApplicable
public boolean isApplicable(int colID)
Indicates if this encoder is applicable for the given column ID, i.e., if it is subject to this transformation.- Parameters:
colID
- column ID- Returns:
- true if encoder is applicable for given column
-
prepareBuildPartial
public void prepareBuildPartial()
Allocates internal data structures for partial build.- Specified by:
prepareBuildPartial
in interfaceEncoder
-
getDomainSize
public int getDomainSize()
-
buildPartial
public void buildPartial(FrameBlock in)
Partial build of internal data structures (e.g., in distributed spark operations).- Specified by:
buildPartial
in interfaceEncoder
- Parameters:
in
- input frame block
-
build
public void build(CacheBlock<?> in, double[] equiHeightMaxs)
-
build
public void build(CacheBlock<?> in, Map<Integer,double[]> equiHeightMaxs)
-
mergeAt
public void mergeAt(ColumnEncoder other)
Merges another encoder, of a compatible type, in after a certain position. Resizes as necessary.ColumnEncoders
are compatible with themselves andEncoderComposite
is compatible with every otherColumnEncoders
.MultiColumnEncoders
are compatible with every encoder- Parameters:
other
- the encoder that should be merged in
-
updateIndexRanges
public void updateIndexRanges(long[] beginDims, long[] endDims, int colOffset)
Update index-ranges to after encoding. Note that only Dummycoding changes the ranges.- Specified by:
updateIndexRanges
in interfaceEncoder
- Parameters:
beginDims
- begin dimensions of rangeendDims
- end dimensions of rangecolOffset
- is applied to begin and endDims
-
getColMapping
public MatrixBlock getColMapping(FrameBlock meta)
Obtain the column mapping of encoded frames based on the passed meta data frame.- Parameters:
meta
- meta data frame block- Returns:
- matrix with column mapping (one row per attribute)
-
writeExternal
public void writeExternal(ObjectOutput os) throws IOException
Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd serialization.- Specified by:
writeExternal
in interfaceExternalizable
- Parameters:
os
- object output- Throws:
IOException
- if IOException occurs
-
readExternal
public void readExternal(ObjectInput in) throws IOException
Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd deserialization.- Specified by:
readExternal
in interfaceExternalizable
- Parameters:
in
- object input- Throws:
IOException
- if IOException occur
-
getColID
public int getColID()
-
setColID
public void setColID(int colID)
-
shiftCol
public void shiftCol(int columnOffset)
-
setEstMetaSize
public void setEstMetaSize(long estSize)
-
getEstMetaSize
public long getEstMetaSize()
-
setEstNumDistincts
public void setEstNumDistincts(int numDistincts)
-
getEstNumDistincts
public int getEstNumDistincts()
-
compareTo
public int compareTo(ColumnEncoder o)
- Specified by:
compareTo
in interfaceComparable<ColumnEncoder>
-
getBuildTasks
public List<DependencyTask<?>> getBuildTasks(CacheBlock<?> in)
-
getBuildTask
public Callable<Object> getBuildTask(CacheBlock<?> in)
-
getPartialBuildTask
public Callable<Object> getPartialBuildTask(CacheBlock<?> in, int startRow, int blockSize, HashMap<Integer,Object> ret)
-
getApplyTasks
public List<DependencyTask<?>> getApplyTasks(CacheBlock<?> in, MatrixBlock out, int outputCol)
-
-