Class Array<T>
- java.lang.Object
-
- org.apache.sysds.runtime.frame.data.columns.Array<T>
-
- All Implemented Interfaces:
org.apache.hadoop.io.Writable
- Direct Known Subclasses:
ABooleanArray
,ACompressedArray
,CharArray
,DoubleArray
,FloatArray
,HashIntegerArray
,HashLongArray
,IntegerArray
,LongArray
,OptionalArray
,RaggedArray
,StringArray
public abstract class Array<T> extends Object implements org.apache.hadoop.io.Writable
Generic, resizable native arrays for the internal representation of the columns in the FrameBlock. We use this custom class hierarchy instead of Trove or other libraries in order to avoid unnecessary dependencies.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description class
Array.ArrayIterator
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description Pair<Types.ValueType,Boolean>
analyzeValueType()
Analyze the column to figure out if the value type can be refined to a better type.abstract Pair<Types.ValueType,Boolean>
analyzeValueType(int maxCells)
Analyze the column to figure out if the value type can be refined to a better type.abstract void
append(String value)
Append a string value to the current Array, this should in general be avoided, and appending larger blocks at a time should be preferred.abstract Array<T>
append(Array<T> other)
Append other array, if the other array is fitting in current allocated size use that allocated size, otherwise allocate new array to combine the other with this.abstract void
append(T value)
Append a value of the same type of the Array.static long
baseMemoryCost()
Get the base memory cost of the Arrays allocation.Array<?>
changeType(Types.ValueType t)
Change the allocated array to a different type.Array<?>
changeType(Types.ValueType t, boolean containsNull)
Change type taking into consideration if the target type must be able to contain Null.Array<?>
changeType(Array<?> ret)
Change type by moving this arrays value into the given ret array.Array<?>
changeType(Array<?> ret, int rl, int ru)
Put the changed value types into the given ret array inside the range specified.Array<?>
changeTypeWithNulls(Types.ValueType t)
Array<?>
changeTypeWithNulls(Array<?> ret)
Array<?>
changeTypeWithNulls(Array<?> ret, int l, int u)
abstract Array<T>
clone()
Overwrite of the java internal clone function for arrays, return a clone of underlying data that is mutable, (not immutable data.) Immutable data is dependent on the individual allocated arraysboolean
containsNull()
analyze if the array contains null values.boolean
equals(Object other)
abstract boolean
equals(Array<T> other)
Equals operation on arrays.double[]
extractDouble(double[] ret, int rl, int ru)
Extract the sub array into the ret array as doubles.abstract void
fill(String val)
fill the entire array with specific value.abstract void
fill(T val)
fill the entire array with specific value.void
findEmpty(boolean[] select)
Find the empty rows, it is assumed that the input is to be only modified to set variables to true.void
findEmptyInverse(boolean[] select)
Find the filled rows, it is assumed that the input i to be only modified to set variables to true;abstract Object
get()
Get the underlying array out of the column Group, it is the responsibility of the caller to know what type it is.abstract T
get(int index)
Get the value at a given index.abstract byte[]
getAsByteArray()
Return the current allocated Array as a byte[], this is used to serialize the allocated Arrays out to the PythonAPI.abstract double
getAsDouble(int i)
Get the index's value.double
getAsNaNDouble(int i)
Get the index's value.SoftReference<Map<T,Long>>
getCache()
Get the current cached recode map.abstract long
getExactSerializedSize()
Get the exact serialized size on disk of this array.abstract ArrayFactory.FrameArrayType
getFrameArrayType()
Get the internal FrameArrayType, to specify the encoding of the Types, note there are more Frame Array Types than there is ValueTypes.long
getInMemorySize()
Get in memory size, not counting reference to this object.T
getInternal(int index)
Get the internal value at a given index.Array.ArrayIterator
getIterator()
Pair<Integer,Integer>
getMinMaxLength()
Get the minimum and maximum length of the contained values as string type.ABooleanArray
getNulls()
Map<T,Long>
getRecodeMap()
Get a recode map that maps each unique value in the array, to a long ID.abstract Types.ValueType
getValueType()
Get the current value type of this array.abstract double
hashDouble(int idx)
Hash the given index of the array.abstract boolean
isEmpty()
Get if this array is empty, aka filled with empty values.abstract boolean
isNotEmpty(int i)
abstract boolean
isShallowSerialize()
analyze if this array can be shallow serialized.double[]
minMax()
Get the minimum and maximum double value of this array.double[]
minMax(int l, int u)
Get the minimum and maximum double value of a specific sub part of this array.abstract boolean
possiblyContainsNaN()
abstract void
reset(int size)
Reset the Array and set to a different size.abstract Array<T>
select(boolean[] select, int nTrue)
Slice out the true indices in the select input and return the sub array.abstract Array<T>
select(int[] indices)
Slice out the specified indices and return the sub array.abstract void
set(int index, double value)
Set index to given double value (cast to the correct type of this array)abstract void
set(int rl, int ru, Array<T> value)
Set range to given arrays valuevoid
set(int rl, int ru, Array<T> value, int rlSrc)
Set range to given arrays value with an offset into other arrayabstract void
set(int index, String value)
Set index to the given value of the string parsed.abstract void
set(int index, T value)
Set index to the given value of same typevoid
setCache(SoftReference<Map<T,Long>> m)
Set the cached hashmap cache of this Array allocation, to be used in transformEncode.abstract void
setFromOtherType(int rl, int ru, Array<?> value)
Set range to given arrays valueabstract void
setFromOtherTypeNz(int rl, int ru, Array<?> value)
Set non default values in the range from the value array givenvoid
setFromOtherTypeNz(Array<?> value)
Set non default values from the value array givenabstract void
setNz(int rl, int ru, Array<T> value)
Set non default values in the range from the value array givenvoid
setNz(Array<T> value)
Set non default values from the value array givenint
size()
Get the number of elements in the array, this does not necessarily reflect the current allocated size.abstract Array<T>
slice(int rl, int ru)
Slice out the sub range and return new array with the specified type.ArrayCompressionStatistics
statistics(int nSamples)
Get the compression statistics of this array allocation.String
toString()
-
-
-
Method Detail
-
getCache
public final SoftReference<Map<T,Long>> getCache()
Get the current cached recode map.- Returns:
- The cached recode map
-
setCache
public final void setCache(SoftReference<Map<T,Long>> m)
Set the cached hashmap cache of this Array allocation, to be used in transformEncode.- Parameters:
m
- The element to cache.
-
getRecodeMap
public final Map<T,Long> getRecodeMap()
Get a recode map that maps each unique value in the array, to a long ID. Null values are ignored, and not included in the mapping. The resulting recode map in stored in a soft reference to speed up repeated calls to the same column.- Returns:
- A recode map
-
size
public final int size()
Get the number of elements in the array, this does not necessarily reflect the current allocated size.- Returns:
- the current number of elements
-
get
public abstract T get(int index)
Get the value at a given index. This method returns objects that have a high overhead in allocation. Therefore it is not as efficient as using the vectorized operations specified in the object.- Parameters:
index
- The index to query- Returns:
- The value returned as an object
-
getInternal
public T getInternal(int index)
Get the internal value at a given index. For instance HashIntegerArray would return the underlying long not a string.- Parameters:
index
- the index to get- Returns:
- The value to get
-
get
public abstract Object get()
Get the underlying array out of the column Group, it is the responsibility of the caller to know what type it is. Also it is not guaranteed that the underlying data structure does not allocate an appropriate response to the caller. This in practice means that if called there is a possibility that the entire array is allocated again. So the method should only be used for debugging purposes not for performance.- Returns:
- The underlying array.
-
getAsDouble
public abstract double getAsDouble(int i)
Get the index's value. returns 0 in case of Null.- Parameters:
i
- index to get value from- Returns:
- the value
-
getAsNaNDouble
public double getAsNaNDouble(int i)
Get the index's value. returns Double.NaN in case of Null.- Parameters:
i
- index to get value from- Returns:
- the value
-
set
public abstract void set(int index, T value)
Set index to the given value of same type- Parameters:
index
- The index to setvalue
- The value to assign
-
set
public abstract void set(int index, double value)
Set index to given double value (cast to the correct type of this array)- Parameters:
index
- the index to setvalue
- the value to set it to (before casting to correct value type)
-
set
public abstract void set(int index, String value)
Set index to the given value of the string parsed.- Parameters:
index
- The index to setvalue
- The value to assign
-
setFromOtherType
public abstract void setFromOtherType(int rl, int ru, Array<?> value)
Set range to given arrays value- Parameters:
rl
- row lowerru
- row upper (inclusive)value
- value array to take values from (other type)
-
set
public abstract void set(int rl, int ru, Array<T> value)
Set range to given arrays value- Parameters:
rl
- row lowerru
- row upper (inclusive)value
- value array to take values from (same type)
-
set
public void set(int rl, int ru, Array<T> value, int rlSrc)
Set range to given arrays value with an offset into other array- Parameters:
rl
- row lowerru
- row upper (inclusive)value
- value array to take values fromrlSrc
- the offset into the value array to take values from
-
setNz
public final void setNz(Array<T> value)
Set non default values from the value array given- Parameters:
value
- array of same type and length
-
setNz
public abstract void setNz(int rl, int ru, Array<T> value)
Set non default values in the range from the value array given- Parameters:
rl
- row startru
- row upper inclusivevalue
- value array of same type
-
setFromOtherTypeNz
public final void setFromOtherTypeNz(Array<?> value)
Set non default values from the value array given- Parameters:
value
- array of other type
-
setFromOtherTypeNz
public abstract void setFromOtherTypeNz(int rl, int ru, Array<?> value)
Set non default values in the range from the value array given- Parameters:
rl
- row startru
- row end inclusivevalue
- value array of different type
-
append
public abstract void append(String value)
Append a string value to the current Array, this should in general be avoided, and appending larger blocks at a time should be preferred.- Parameters:
value
- The value to append
-
append
public abstract void append(T value)
Append a value of the same type of the Array. This should in general be avoided, and appending larger blocks at a time should be preferred.- Parameters:
value
- The value to append
-
append
public abstract Array<T> append(Array<T> other)
Append other array, if the other array is fitting in current allocated size use that allocated size, otherwise allocate new array to combine the other with this. This method should use the set range function, and should be preferred over the append single values.- Parameters:
other
- The other array of same type to append to this.- Returns:
- The combined arrays.
-
slice
public abstract Array<T> slice(int rl, int ru)
Slice out the sub range and return new array with the specified type. If the conversion fails fallback to normal slice.- Parameters:
rl
- row startru
- row end (not included)- Returns:
- A new array of sub range.
-
reset
public abstract void reset(int size)
Reset the Array and set to a different size. This method is used to reuse an already allocated Array, without extra allocation. It should only be done in cases where the Array is no longer in use in any FrameBlocks.- Parameters:
size
- The size to reallocate into.
-
getAsByteArray
public abstract byte[] getAsByteArray()
Return the current allocated Array as a byte[], this is used to serialize the allocated Arrays out to the PythonAPI.- Returns:
- The array as bytes
-
getValueType
public abstract Types.ValueType getValueType()
Get the current value type of this array.- Returns:
- The current value type.
-
analyzeValueType
public final Pair<Types.ValueType,Boolean> analyzeValueType()
Analyze the column to figure out if the value type can be refined to a better type. The return is in two parts, first the type it can be, second if it contains nulls.- Returns:
- A better or equivalent value type to represent the column, including null information.
-
analyzeValueType
public abstract Pair<Types.ValueType,Boolean> analyzeValueType(int maxCells)
Analyze the column to figure out if the value type can be refined to a better type. The return is in two parts, first the type it can be, second if it contains nulls.- Parameters:
maxCells
- maximum number of cells to analyze- Returns:
- A better or equivalent value type to represent the column, including null information.
-
getFrameArrayType
public abstract ArrayFactory.FrameArrayType getFrameArrayType()
Get the internal FrameArrayType, to specify the encoding of the Types, note there are more Frame Array Types than there is ValueTypes.- Returns:
- The FrameArrayType
-
getInMemorySize
public long getInMemorySize()
Get in memory size, not counting reference to this object.- Returns:
- the size in memory of this object.
-
baseMemoryCost
public static long baseMemoryCost()
Get the base memory cost of the Arrays allocation.- Returns:
- The base memory cost
-
getExactSerializedSize
public abstract long getExactSerializedSize()
Get the exact serialized size on disk of this array.- Returns:
- The exact size on disk
-
getNulls
public ABooleanArray getNulls()
-
containsNull
public boolean containsNull()
analyze if the array contains null values.- Returns:
- If the array contains null.
-
possiblyContainsNaN
public abstract boolean possiblyContainsNaN()
-
changeType
public Array<?> changeType(Types.ValueType t, boolean containsNull)
Change type taking into consideration if the target type must be able to contain Null.- Parameters:
t
- The target typecontainsNull
- If the target should be able to contain null- Returns:
- The changed type array.
-
changeTypeWithNulls
public Array<?> changeTypeWithNulls(Types.ValueType t)
-
changeType
public Array<?> changeType(Types.ValueType t)
Change the allocated array to a different type. If the type is the same a deep copy is returned for safety.- Parameters:
t
- The type to change to- Returns:
- A new column array.
-
changeType
public final Array<?> changeType(Array<?> ret)
Change type by moving this arrays value into the given ret array.- Parameters:
ret
- The Array to put this arrays values into- Returns:
- The ret array given
-
changeType
public final Array<?> changeType(Array<?> ret, int rl, int ru)
Put the changed value types into the given ret array inside the range specified.- Parameters:
ret
- The Array to put this arrays values intorl
- inclusive lower boundru
- exclusive upper bound- Returns:
- The ret array given.
-
getMinMaxLength
public Pair<Integer,Integer> getMinMaxLength()
Get the minimum and maximum length of the contained values as string type.- Returns:
- A Pair of first the minimum length, second the maximum length
-
fill
public abstract void fill(String val)
fill the entire array with specific value.- Parameters:
val
- the value to fill with.
-
fill
public abstract void fill(T val)
fill the entire array with specific value.- Parameters:
val
- the value to fill with.
-
isShallowSerialize
public abstract boolean isShallowSerialize()
analyze if this array can be shallow serialized. to allow caching without modification.- Returns:
- boolean saying true if shallow serialization is available
-
isEmpty
public abstract boolean isEmpty()
Get if this array is empty, aka filled with empty values.- Returns:
- boolean saying true if empty
-
select
public abstract Array<T> select(int[] indices)
Slice out the specified indices and return the sub array.- Parameters:
indices
- The indices to slice out- Returns:
- the sliced out indices in an array format
-
select
public abstract Array<T> select(boolean[] select, int nTrue)
Slice out the true indices in the select input and return the sub array.- Parameters:
select
- a boolean vector specifying what to selectnTrue
- number of true values inside select- Returns:
- the sliced out indices in an array format
-
findEmpty
public final void findEmpty(boolean[] select)
Find the empty rows, it is assumed that the input is to be only modified to set variables to true.- Parameters:
select
- Modify this to true in indexes that are not empty.
-
isNotEmpty
public abstract boolean isNotEmpty(int i)
-
findEmptyInverse
public void findEmptyInverse(boolean[] select)
Find the filled rows, it is assumed that the input i to be only modified to set variables to true;- Parameters:
select
- modify this to true in indexes that are empty.
-
clone
public abstract Array<T> clone()
Overwrite of the java internal clone function for arrays, return a clone of underlying data that is mutable, (not immutable data.) Immutable data is dependent on the individual allocated arrays- Returns:
- A clone
-
hashDouble
public abstract double hashDouble(int idx)
Hash the given index of the array. It is allowed to return NaN on null elements.- Parameters:
idx
- The index to hash- Returns:
- The hash value of that index.
-
getIterator
public Array.ArrayIterator getIterator()
-
extractDouble
public double[] extractDouble(double[] ret, int rl, int ru)
Extract the sub array into the ret array as doubles. The ret array is filled from - rl, meaning that the ret array should be of length ru - rl.- Parameters:
ret
- The array to returnrl
- The row to start atru
- The row to end at (not inclusive.)- Returns:
- The ret array given as argument
-
equals
public abstract boolean equals(Array<T> other)
Equals operation on arrays.- Parameters:
other
- The other array to compare to.- Returns:
- True if the arrays are equivalent.
-
statistics
public ArrayCompressionStatistics statistics(int nSamples)
Get the compression statistics of this array allocation.- Parameters:
nSamples
- The number of sample elements suggested (not forced) to be used.- Returns:
- The compression statistics of this array.
-
minMax
public double[] minMax()
Get the minimum and maximum double value of this array. Note that we ignore NaN Values.- Returns:
- The min and max in index 0 and 1 of the array.
-
minMax
public double[] minMax(int l, int u)
Get the minimum and maximum double value of a specific sub part of this array. Note that we ignore NaN Values.- Parameters:
l
- The lower index to search fromu
- The upper index to end at (not inclusive)- Returns:
- The min and max in index 0 and 1 of the array in the range.
-
-