Class OptimizerRuleBased

  • Direct Known Subclasses:
    OptimizerConstrained, OptimizerHeuristic

    public class OptimizerRuleBased
    extends Optimizer
    Rule-Based ParFor Optimizer (time: O(n)): Applied rule-based rewrites - 1) rewrite set data partitioner (incl. recompile RIX) - 2) rewrite remove unnecessary compare matrix - 3) rewrite result partitioning (incl. recompile LIX) - 4) rewrite set execution strategy - 5) rewrite set operations exec type (incl. recompile) - 6) rewrite use data colocation - 7) rewrite set partition replication factor - 8) rewrite set export replication factor - 9) rewrite use nested parallelism - 10) rewrite set degree of parallelism - 11) rewrite set task partitioner - 12) rewrite set fused data partitioning and execution - 13) rewrite transpose vector operations (for sparse) - 14) rewrite set in-place result indexing - 15) rewrite disable caching (prevent sparse serialization) - 16) rewrite enable runtime piggybacking - 17) rewrite inject spark loop checkpointing - 18) rewrite inject spark repartition (for zipmm) - 19) rewrite set spark eager rdd caching - 20) rewrite set result merge - 21) rewrite set recompile memory budget - 22) rewrite remove recursive parfor - 23) rewrite remove unnecessary parfor TODO fuse also result merge into fused data partitioning and execute (for writing the result directly from execute we need to partition columns/rows according to blocksize -> rewrite (only applicable if numCols/blocksize>numreducers)+custom MR partitioner) TODO take remote memory into account in data/result partitioning rewrites (smaller/larger) TODO memory estimates with shared reads TODO memory estimates of result merge into plan tree TODO blockwise partitioning