Class ColumnEncoderComposite
- java.lang.Object
-
- org.apache.sysds.runtime.transform.encode.ColumnEncoder
-
- org.apache.sysds.runtime.transform.encode.ColumnEncoderComposite
-
- All Implemented Interfaces:
Externalizable
,Serializable
,Comparable<ColumnEncoder>
,Encoder
public class ColumnEncoderComposite extends ColumnEncoder
Simple composite encoder that applies a list of encoders in specified order. By implementing the default encoder API it can be used as a drop-in replacement for any other encoder.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.sysds.runtime.transform.encode.ColumnEncoder
ColumnEncoder.EncoderType
-
-
Field Summary
-
Fields inherited from class org.apache.sysds.runtime.transform.encode.ColumnEncoder
APPLY_ROW_BLOCKS_PER_COLUMN, BUILD_ROW_BLOCKS_PER_COLUMN
-
-
Constructor Summary
Constructors Constructor Description ColumnEncoderComposite()
ColumnEncoderComposite(List<ColumnEncoder> columnEncoders)
ColumnEncoderComposite(List<ColumnEncoder> columnEncoders, FrameBlock meta)
ColumnEncoderComposite(ColumnEncoder columnEncoder)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addEncoder(ColumnEncoder other)
void
allocateMetaData(FrameBlock meta)
Pre-allocate a FrameBlock for metadata collection.MatrixBlock
apply(CacheBlock<?> in, MatrixBlock out, int outputCol, int rowStart, int blk)
void
build(CacheBlock<?> in)
Build the transform meta data for the given block input.void
build(CacheBlock<?> in, Map<Integer,double[]> equiHeightMaxs)
void
buildPartial(FrameBlock in)
Partial build of internal data structures (e.g., in distributed spark operations).void
computeRCDMapSizeEstimate(CacheBlock<?> in, int[] sampleIndices)
boolean
equals(Object o)
List<DependencyTask<?>>
getApplyTasks(CacheBlock<?> in, MatrixBlock out, int outputCol)
List<DependencyTask<?>>
getBuildTasks(CacheBlock<?> in)
int
getDomainSize()
<T extends ColumnEncoder>
TgetEncoder(Class<T> type)
List<ColumnEncoder>
getEncoders()
FrameBlock
getMetaData(FrameBlock out)
Construct a frame block out of the transform meta data.Set<Integer>
getSparseRowsWZeros()
<T extends ColumnEncoder>
booleanhasBuild()
<T extends ColumnEncoder>
booleanhasEncoder(Class<T> type)
int
hashCode()
void
initEmbeddings(MatrixBlock embeddings)
void
initMetaData(FrameBlock out)
Sets up the required meta data for a subsequent call to apply.boolean
isBin()
boolean
isBinToDummy()
boolean
isEncoder(int colID, Class<?> type)
boolean
isHash()
boolean
isHashToDummy()
boolean
isPassThrough()
boolean
isRecode()
boolean
isRecodeToDummy()
void
mergeAt(ColumnEncoder other)
Merges another encoder, of a compatible type, in after a certain position.void
prepareBuildPartial()
Allocates internal data structures for partial build.void
readExternal(ObjectInput in)
Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd deserialization.void
setNumPartitions(int nBuild, int nApply)
void
shiftCol(int columnOffset)
String
toString()
void
updateAllDCEncoders()
void
updateIndexRanges(long[] beginDims, long[] endDims, int colOffset)
Update index-ranges to after encoding.void
writeExternal(ObjectOutput out)
Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd serialization.-
Methods inherited from class org.apache.sysds.runtime.transform.encode.ColumnEncoder
apply, build, compareTo, getBuildTask, getColID, getColMapping, getEstMetaSize, getEstNumDistincts, getPartialBuildTask, getPartialMergeBuildTask, isApplicable, isApplicable, setColID, setEstMetaSize, setEstNumDistincts
-
-
-
-
Constructor Detail
-
ColumnEncoderComposite
public ColumnEncoderComposite()
-
ColumnEncoderComposite
public ColumnEncoderComposite(List<ColumnEncoder> columnEncoders, FrameBlock meta)
-
ColumnEncoderComposite
public ColumnEncoderComposite(List<ColumnEncoder> columnEncoders)
-
ColumnEncoderComposite
public ColumnEncoderComposite(ColumnEncoder columnEncoder)
-
-
Method Detail
-
getEncoders
public List<ColumnEncoder> getEncoders()
-
getEncoder
public <T extends ColumnEncoder> T getEncoder(Class<T> type)
-
isEncoder
public boolean isEncoder(int colID, Class<?> type)
-
build
public void build(CacheBlock<?> in)
Description copied from interface:Encoder
Build the transform meta data for the given block input. This call modifies and keeps meta data as encoder state.- Parameters:
in
- input frame block
-
build
public void build(CacheBlock<?> in, Map<Integer,double[]> equiHeightMaxs)
- Overrides:
build
in classColumnEncoder
-
getApplyTasks
public List<DependencyTask<?>> getApplyTasks(CacheBlock<?> in, MatrixBlock out, int outputCol)
- Overrides:
getApplyTasks
in classColumnEncoder
-
getBuildTasks
public List<DependencyTask<?>> getBuildTasks(CacheBlock<?> in)
- Overrides:
getBuildTasks
in classColumnEncoder
-
prepareBuildPartial
public void prepareBuildPartial()
Description copied from class:ColumnEncoder
Allocates internal data structures for partial build.- Specified by:
prepareBuildPartial
in interfaceEncoder
- Overrides:
prepareBuildPartial
in classColumnEncoder
-
buildPartial
public void buildPartial(FrameBlock in)
Description copied from class:ColumnEncoder
Partial build of internal data structures (e.g., in distributed spark operations).- Specified by:
buildPartial
in interfaceEncoder
- Overrides:
buildPartial
in classColumnEncoder
- Parameters:
in
- input frame block
-
apply
public MatrixBlock apply(CacheBlock<?> in, MatrixBlock out, int outputCol, int rowStart, int blk)
- Overrides:
apply
in classColumnEncoder
-
mergeAt
public void mergeAt(ColumnEncoder other)
Description copied from class:ColumnEncoder
Merges another encoder, of a compatible type, in after a certain position. Resizes as necessary.ColumnEncoders
are compatible with themselves andEncoderComposite
is compatible with every otherColumnEncoders
.MultiColumnEncoders
are compatible with every encoder- Overrides:
mergeAt
in classColumnEncoder
- Parameters:
other
- the encoder that should be merged in
-
updateAllDCEncoders
public void updateAllDCEncoders()
-
addEncoder
public void addEncoder(ColumnEncoder other)
-
updateIndexRanges
public void updateIndexRanges(long[] beginDims, long[] endDims, int colOffset)
Description copied from class:ColumnEncoder
Update index-ranges to after encoding. Note that only Dummycoding changes the ranges.- Specified by:
updateIndexRanges
in interfaceEncoder
- Overrides:
updateIndexRanges
in classColumnEncoder
- Parameters:
beginDims
- begin dimensions of rangeendDims
- end dimensions of rangecolOffset
- is applied to begin and endDims
-
allocateMetaData
public void allocateMetaData(FrameBlock meta)
Description copied from interface:Encoder
Pre-allocate a FrameBlock for metadata collection.- Parameters:
meta
- frame block
-
getMetaData
public FrameBlock getMetaData(FrameBlock out)
Description copied from interface:Encoder
Construct a frame block out of the transform meta data.- Parameters:
out
- output frame block- Returns:
- output frame block?
-
initMetaData
public void initMetaData(FrameBlock out)
Description copied from interface:Encoder
Sets up the required meta data for a subsequent call to apply.- Parameters:
out
- frame block
-
initEmbeddings
public void initEmbeddings(MatrixBlock embeddings)
- Overrides:
initEmbeddings
in classColumnEncoder
-
writeExternal
public void writeExternal(ObjectOutput out) throws IOException
Description copied from class:ColumnEncoder
Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd serialization.- Specified by:
writeExternal
in interfaceExternalizable
- Overrides:
writeExternal
in classColumnEncoder
- Parameters:
out
- object output- Throws:
IOException
- if IOException occurs
-
readExternal
public void readExternal(ObjectInput in) throws IOException
Description copied from class:ColumnEncoder
Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd deserialization.- Specified by:
readExternal
in interfaceExternalizable
- Overrides:
readExternal
in classColumnEncoder
- Parameters:
in
- object input- Throws:
IOException
- if IOException occur
-
hasEncoder
public <T extends ColumnEncoder> boolean hasEncoder(Class<T> type)
-
hasBuild
public <T extends ColumnEncoder> boolean hasBuild()
-
computeRCDMapSizeEstimate
public void computeRCDMapSizeEstimate(CacheBlock<?> in, int[] sampleIndices)
-
setNumPartitions
public void setNumPartitions(int nBuild, int nApply)
-
shiftCol
public void shiftCol(int columnOffset)
- Overrides:
shiftCol
in classColumnEncoder
-
getSparseRowsWZeros
public Set<Integer> getSparseRowsWZeros()
- Overrides:
getSparseRowsWZeros
in classColumnEncoder
-
getDomainSize
public int getDomainSize()
- Overrides:
getDomainSize
in classColumnEncoder
-
isRecodeToDummy
public boolean isRecodeToDummy()
-
isRecode
public boolean isRecode()
-
isPassThrough
public boolean isPassThrough()
-
isBin
public boolean isBin()
-
isBinToDummy
public boolean isBinToDummy()
-
isHash
public boolean isHash()
-
isHashToDummy
public boolean isHashToDummy()
-
-