Class RemoteParForSpark


  • public class RemoteParForSpark
    extends Object
    This class serves two purposes: (1) isolating Spark imports to enable running in environments where no Spark libraries are available, and (2) to follow the same structure as the parfor remote_mr job submission. NOTE: currently, we still exchange inputs and outputs via hdfs (this covers the general case if data already resides in HDFS, in-memory data, and partitioned inputs; also, it allows for pre-aggregation by overwriting partial task results with pre-paggregated results from subsequent iterations) TODO reducebykey on variable names