Class DataAugmentation


  • public class DataAugmentation
    extends Object
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static FrameBlock dataCorruption​(FrameBlock input, double pTypo, double pMiss, double pDrop, double pOut, double pSwap, int seed)
      This function returns a new frame block with error introduced in the data: Typos in string values, null values, outliers in numeric data and swapped elements.
      static FrameBlock miss​(FrameBlock frame, double pMiss, double pDrop, int seed)
      This function modifies the given, preprocessed frame block to add missing values to some of the rows, marking them with the label missing.
      static FrameBlock outlier​(FrameBlock frame, List<Integer> numerics, double pOut, double pPos, int times, int seed)
      This function modifies the given, preprocessed frame block to add outliers to some of the numeric data of the frame, adding or several times the standard deviation, and marking them with the label outlier.
      static FrameBlock preprocessing​(FrameBlock frame, List<Integer> numerics, List<Integer> strings, List<Integer> swappable)
      This function returns a new frame block with a labels column added, and build the lists with column index of the different types of data.
      static FrameBlock swap​(FrameBlock frame, List<Integer> swappable, double pSwap, int seed)
      This function modifies the given, preprocessed frame block to add swapped fields of the same ValueType that are consecutive, marking them with the label swap.
      static FrameBlock typos​(FrameBlock frame, List<Integer> strings, double pTypo, int seed)
      This function modifies the given, preprocessed frame block to add typos to the string values, marking them with the label typos.
    • Constructor Detail

      • DataAugmentation

        public DataAugmentation()
    • Method Detail

      • dataCorruption

        public static FrameBlock dataCorruption​(FrameBlock input,
                                                double pTypo,
                                                double pMiss,
                                                double pDrop,
                                                double pOut,
                                                double pSwap,
                                                int seed)
        This function returns a new frame block with error introduced in the data: Typos in string values, null values, outliers in numeric data and swapped elements.
        Parameters:
        input - Original frame block
        pTypo - Probability of introducing a typo in a row
        pMiss - Probability of introducing missing values in a row
        pDrop - Probability of dropping a value inside a row
        pOut - Probability of introducing outliers in a row
        pSwap - Probability swapping two elements in a row
        seed - The seed for the random generation of errors
        Returns:
        A new FrameBlock with corrupted elements
      • preprocessing

        public static FrameBlock preprocessing​(FrameBlock frame,
                                               List<Integer> numerics,
                                               List<Integer> strings,
                                               List<Integer> swappable)
        This function returns a new frame block with a labels column added, and build the lists with column index of the different types of data.
        Parameters:
        frame - Original frame block
        numerics - Empty list to return the numeric positions
        strings - Empty list to return the string positions
        swappable - Empty list to return the swappable positions
        Returns:
        A new frameblock with a labels column
      • typos

        public static FrameBlock typos​(FrameBlock frame,
                                       List<Integer> strings,
                                       double pTypo,
                                       int seed)
        This function modifies the given, preprocessed frame block to add typos to the string values, marking them with the label typos.
        Parameters:
        frame - Original frame block
        strings - List with the columns of string type that can be changed, generated during preprocessing or manually selected
        pTypo - Probability of adding a typo to a row
        seed - The seed for the random behavior
        Returns:
        A new frameblock with typos
      • miss

        public static FrameBlock miss​(FrameBlock frame,
                                      double pMiss,
                                      double pDrop,
                                      int seed)
        This function modifies the given, preprocessed frame block to add missing values to some of the rows, marking them with the label missing.
        Parameters:
        frame - Original frame block
        pMiss - Probability of adding missing values to a row
        pDrop - Probability of dropping a value
        seed - The seed for randomness
        Returns:
        A new frameblock with missing values
      • outlier

        public static FrameBlock outlier​(FrameBlock frame,
                                         List<Integer> numerics,
                                         double pOut,
                                         double pPos,
                                         int times,
                                         int seed)
        This function modifies the given, preprocessed frame block to add outliers to some of the numeric data of the frame, adding or several times the standard deviation, and marking them with the label outlier.
        Parameters:
        frame - Original frame block
        numerics - List with the columns of numeric type that can be changed, generated during preprocessing or manually selected
        pOut - Probability of introducing an outlier in a row
        pPos - Probability of using positive deviation
        times - Times the standard deviation is added
        seed - The seed for randomness
        Returns:
        A new frameblock with outliers
      • swap

        public static FrameBlock swap​(FrameBlock frame,
                                      List<Integer> swappable,
                                      double pSwap,
                                      int seed)
        This function modifies the given, preprocessed frame block to add swapped fields of the same ValueType that are consecutive, marking them with the label swap.
        Parameters:
        frame - Original frame block
        swappable - List with the columns that are swappable, generated during preprocessing
        pSwap - Probability of swapping two fields in a row
        seed - seed
        Returns:
        A new frameblock with swapped elements