Package org.apache.sysds.utils
Class DataAugmentation
- java.lang.Object
-
- org.apache.sysds.utils.DataAugmentation
-
public class DataAugmentation extends Object
-
-
Constructor Summary
Constructors Constructor Description DataAugmentation()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static FrameBlock
dataCorruption(FrameBlock input, double pTypo, double pMiss, double pDrop, double pOut, double pSwap, int seed)
This function returns a new frame block with error introduced in the data: Typos in string values, null values, outliers in numeric data and swapped elements.static FrameBlock
miss(FrameBlock frame, double pMiss, double pDrop, int seed)
This function modifies the given, preprocessed frame block to add missing values to some of the rows, marking them with the label missing.static FrameBlock
outlier(FrameBlock frame, List<Integer> numerics, double pOut, double pPos, int times, int seed)
This function modifies the given, preprocessed frame block to add outliers to some of the numeric data of the frame, adding or several times the standard deviation, and marking them with the label outlier.static FrameBlock
preprocessing(FrameBlock frame, List<Integer> numerics, List<Integer> strings, List<Integer> swappable)
This function returns a new frame block with a labels column added, and build the lists with column index of the different types of data.static FrameBlock
swap(FrameBlock frame, List<Integer> swappable, double pSwap, int seed)
This function modifies the given, preprocessed frame block to add swapped fields of the same ValueType that are consecutive, marking them with the label swap.static FrameBlock
typos(FrameBlock frame, List<Integer> strings, double pTypo, int seed)
This function modifies the given, preprocessed frame block to add typos to the string values, marking them with the label typos.
-
-
-
Method Detail
-
dataCorruption
public static FrameBlock dataCorruption(FrameBlock input, double pTypo, double pMiss, double pDrop, double pOut, double pSwap, int seed)
This function returns a new frame block with error introduced in the data: Typos in string values, null values, outliers in numeric data and swapped elements.- Parameters:
input
- Original frame blockpTypo
- Probability of introducing a typo in a rowpMiss
- Probability of introducing missing values in a rowpDrop
- Probability of dropping a value inside a rowpOut
- Probability of introducing outliers in a rowpSwap
- Probability swapping two elements in a rowseed
- The seed for the random generation of errors- Returns:
- A new FrameBlock with corrupted elements
-
preprocessing
public static FrameBlock preprocessing(FrameBlock frame, List<Integer> numerics, List<Integer> strings, List<Integer> swappable)
This function returns a new frame block with a labels column added, and build the lists with column index of the different types of data.- Parameters:
frame
- Original frame blocknumerics
- Empty list to return the numeric positionsstrings
- Empty list to return the string positionsswappable
- Empty list to return the swappable positions- Returns:
- A new frameblock with a labels column
-
typos
public static FrameBlock typos(FrameBlock frame, List<Integer> strings, double pTypo, int seed)
This function modifies the given, preprocessed frame block to add typos to the string values, marking them with the label typos.- Parameters:
frame
- Original frame blockstrings
- List with the columns of string type that can be changed, generated during preprocessing or manually selectedpTypo
- Probability of adding a typo to a rowseed
- The seed for the random behavior- Returns:
- A new frameblock with typos
-
miss
public static FrameBlock miss(FrameBlock frame, double pMiss, double pDrop, int seed)
This function modifies the given, preprocessed frame block to add missing values to some of the rows, marking them with the label missing.- Parameters:
frame
- Original frame blockpMiss
- Probability of adding missing values to a rowpDrop
- Probability of dropping a valueseed
- The seed for randomness- Returns:
- A new frameblock with missing values
-
outlier
public static FrameBlock outlier(FrameBlock frame, List<Integer> numerics, double pOut, double pPos, int times, int seed)
This function modifies the given, preprocessed frame block to add outliers to some of the numeric data of the frame, adding or several times the standard deviation, and marking them with the label outlier.- Parameters:
frame
- Original frame blocknumerics
- List with the columns of numeric type that can be changed, generated during preprocessing or manually selectedpOut
- Probability of introducing an outlier in a rowpPos
- Probability of using positive deviationtimes
- Times the standard deviation is addedseed
- The seed for randomness- Returns:
- A new frameblock with outliers
-
swap
public static FrameBlock swap(FrameBlock frame, List<Integer> swappable, double pSwap, int seed)
This function modifies the given, preprocessed frame block to add swapped fields of the same ValueType that are consecutive, marking them with the label swap.- Parameters:
frame
- Original frame blockswappable
- List with the columns that are swappable, generated during preprocessingpSwap
- Probability of swapping two fields in a rowseed
- seed- Returns:
- A new frameblock with swapped elements
-
-