Class CompressionSettings


  • public class CompressionSettings
    extends Object
    Compression Settings class, used as a bundle of parameters inside the Compression framework. See CompressionSettingsBuilder for default non static parameters.
    • Field Detail

      • PAR_DDC_THRESHOLD

        public static int PAR_DDC_THRESHOLD
        Parallelization threshold for DDC compression
      • BITMAP_BLOCK_SZ

        public static final int BITMAP_BLOCK_SZ
        Size of the blocks used in a blocked bitmap representation. Note it is exactly Character.MAX_VALUE. This is not Character max value + 1 because it breaks the offsets in cases with fully dense values.
        See Also:
        Constant Field Values
      • sortTuplesByFrequency

        public final boolean sortTuplesByFrequency
        Sorting of values by physical length helps by 10-20%, especially for serial, while slight performance decrease for parallel incl multi-threaded, hence not applied for distributed operations (also because compression time + garbage collection increases)
      • samplingRatio

        public final double samplingRatio
        The sampling ratio used when choosing ColGroups. Note that, default behavior is to use exact estimator if the number of elements is below 1000. DEPRECATED
      • samplePower

        public final double samplePower
        The sampling ratio power to use when choosing sample size. This is used in accordance to the function: sampleSize += nRows^samplePower; The value is bounded to be in the range of 0 to 1, 1 giving a sample size of everything, and 0 adding 1.
      • allowSharedDictionary

        public final boolean allowSharedDictionary
        Share DDC Dictionaries between ColGroups.
      • transposeInput

        public final String transposeInput
        Boolean specifying which transpose setting is used, can be auto, true or false
      • seed

        public final int seed
        If the seed is -1 then the system used system millisecond time and class hash for seeding.
      • lossy

        public final boolean lossy
        True if lossy compression is enabled
      • columnPartitioner

        public final CoCoderFactory.PartitionerType columnPartitioner
        The selected method for column partitioning used in CoCoding compressed columns
      • maxColGroupCoCode

        public final int maxColGroupCoCode
        The maximum number of columns CoCoded allowed
      • coCodePercentage

        public final double coCodePercentage
        A Cocode parameter that differ in behavior based on compression method, in general it is a value that reflects aggressively likely coCoding is used.
      • validCompressions

        public final EnumSet<AColGroup.CompressionType> validCompressions
        Valid Compressions List, containing the ColGroup CompressionTypes that are allowed to be used for the compression Default is to always allow for Uncompromisable ColGroup.
      • minimumSampleSize

        public final int minimumSampleSize
        The minimum size of the sample extracted.
      • maxSampleSize

        public final int maxSampleSize
        The maximum size of the sample extracted.
      • transposed

        public boolean transposed
        Transpose input matrix, to optimize access when extracting bitmaps. This setting is changed inside the script based on the transposeInput setting. This is intentionally left as a mutable value, since the transposition of the input matrix is decided in phase 3.
      • minimumCompressionRatio

        public final double minimumCompressionRatio
        The minimum compression ratio to achieve.
      • isInSparkInstruction

        public final boolean isInSparkInstruction
        Is a spark instruction
    • Method Detail

      • isRLEAllowed

        public boolean isRLEAllowed()