Class ValueAggregatorBaseDescriptor

java.lang.Object
org.apache.hadoop.mapreduce.lib.aggregate.ValueAggregatorBaseDescriptor
All Implemented Interfaces:
ValueAggregatorDescriptor
Direct Known Subclasses:
ValueAggregatorBaseDescriptor

@Public @Stable public class ValueAggregatorBaseDescriptor extends Object implements ValueAggregatorDescriptor
This class implements the common functionalities of the subclasses of ValueAggregatorDescriptor class.
  • Field Details

  • Constructor Details

    • ValueAggregatorBaseDescriptor

      public ValueAggregatorBaseDescriptor()
  • Method Details

    • generateEntry

      public static Map.Entry<Text,Text> generateEntry(String type, String id, Text val)
      Parameters:
      type - the aggregation type
      id - the aggregation id
      val - the val associated with the id to be aggregated
      Returns:
      an Entry whose key is the aggregation id prefixed with the aggregation type.
    • generateValueAggregator

      public static ValueAggregator generateValueAggregator(String type, long uniqCount)
      Parameters:
      type - the aggregation type
      uniqCount - the limit in the number of unique values to keep, if type is UNIQ_VALUE_COUNT
      Returns:
      a value aggregator of the given type.
    • generateKeyValPairs

      public ArrayList<Map.Entry<Text,Text>> generateKeyValPairs(Object key, Object val)
      Generate 1 or 2 aggregation-id/value pairs for the given key/value pair. The first id will be of type LONG_VALUE_SUM, with "record_count" as its aggregation id. If the input is a file split, the second id of the same type will be generated too, with the file name as its aggregation id. This achieves the behavior of counting the total number of records in the input data, and the number of records in each input file.
      Specified by:
      generateKeyValPairs in interface ValueAggregatorDescriptor
      Parameters:
      key - input key
      val - input value
      Returns:
      a list of aggregation id/value pairs. An aggregation id encodes an aggregation type which is used to guide the way to aggregate the value in the reduce/combiner phrase of an Aggregate based job.
    • configure

      public void configure(Configuration conf)
      get the input file name.
      Specified by:
      configure in interface ValueAggregatorDescriptor
      Parameters:
      conf - a configuration object