Class FileOutputCommitter

Direct Known Subclasses:
PartialFileOutputCommitter

@Public @Stable public class FileOutputCommitter extends PathOutputCommitter
An OutputCommitter that commits files specified in job output directory i.e. ${mapreduce.output.fileoutputformat.outputdir}.
  • Field Details

    • PENDING_DIR_NAME

      public static final String PENDING_DIR_NAME
      Name of directory where pending data is placed. Data that has not been committed yet.
      See Also:
    • TEMP_DIR_NAME

      @Deprecated protected static final String TEMP_DIR_NAME
      Deprecated.
      Temporary directory name The static variable to be compatible with M/R 1.x
      See Also:
    • SUCCEEDED_FILE_NAME

      public static final String SUCCEEDED_FILE_NAME
      See Also:
    • SUCCESSFUL_JOB_OUTPUT_DIR_MARKER

      public static final String SUCCESSFUL_JOB_OUTPUT_DIR_MARKER
      See Also:
    • FILEOUTPUTCOMMITTER_ALGORITHM_VERSION

      public static final String FILEOUTPUTCOMMITTER_ALGORITHM_VERSION
      See Also:
    • FILEOUTPUTCOMMITTER_ALGORITHM_VERSION_DEFAULT

      public static final int FILEOUTPUTCOMMITTER_ALGORITHM_VERSION_DEFAULT
      See Also:
    • FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED

      public static final String FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED
      See Also:
    • FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED_DEFAULT

      public static final boolean FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED_DEFAULT
      See Also:
    • FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED

      public static final String FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED
      See Also:
    • FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED_DEFAULT

      public static final boolean FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED_DEFAULT
      See Also:
    • FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS

      public static final String FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS
      See Also:
    • FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS_DEFAULT

      public static final int FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS_DEFAULT
      See Also:
    • FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED

      public static final String FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED
      See Also:
    • FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED_DEFAULT

      public static final boolean FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED_DEFAULT
      See Also:
  • Constructor Details

    • FileOutputCommitter

      public FileOutputCommitter(Path outputPath, TaskAttemptContext context) throws IOException
      Create a file output committer
      Parameters:
      outputPath - the job's output path, or null if you want the output committer to act as a noop.
      context - the task's context
      Throws:
      IOException
    • FileOutputCommitter

      @Private public FileOutputCommitter(Path outputPath, JobContext context) throws IOException
      Create a file output committer
      Parameters:
      outputPath - the job's output path, or null if you want the output committer to act as a noop.
      context - the task's context
      Throws:
      IOException
  • Method Details

    • getOutputPath

      public Path getOutputPath()
      Description copied from class: PathOutputCommitter
      Get the final directory where work will be placed once the job is committed. This may be null, in which case, there is no output path to write data to.
      Specified by:
      getOutputPath in class PathOutputCommitter
      Returns:
      the path where final output of the job should be placed. This could also be considered the committed application attempt path.
    • getJobAttemptPath

      public Path getJobAttemptPath(JobContext context)
      Compute the path where the output of a given job attempt will be placed.
      Parameters:
      context - the context of the job. This is used to get the application attempt id.
      Returns:
      the path to store job attempt data.
    • getJobAttemptPath

      public static Path getJobAttemptPath(JobContext context, Path out)
      Compute the path where the output of a given job attempt will be placed.
      Parameters:
      context - the context of the job. This is used to get the application attempt id.
      out - the output path to place these in.
      Returns:
      the path to store job attempt data.
    • getJobAttemptPath

      protected Path getJobAttemptPath(int appAttemptId)
      Compute the path where the output of a given job attempt will be placed.
      Parameters:
      appAttemptId - the ID of the application attempt for this job.
      Returns:
      the path to store job attempt data.
    • getTaskAttemptPath

      public Path getTaskAttemptPath(TaskAttemptContext context)
      Compute the path where the output of a task attempt is stored until that task is committed.
      Parameters:
      context - the context of the task attempt.
      Returns:
      the path where a task attempt should be stored.
    • getTaskAttemptPath

      public static Path getTaskAttemptPath(TaskAttemptContext context, Path out)
      Compute the path where the output of a task attempt is stored until that task is committed.
      Parameters:
      context - the context of the task attempt.
      out - The output path to put things in.
      Returns:
      the path where a task attempt should be stored.
    • getCommittedTaskPath

      public Path getCommittedTaskPath(TaskAttemptContext context)
      Compute the path where the output of a committed task is stored until the entire job is committed.
      Parameters:
      context - the context of the task attempt
      Returns:
      the path where the output of a committed task is stored until the entire job is committed.
    • getCommittedTaskPath

      public static Path getCommittedTaskPath(TaskAttemptContext context, Path out)
    • getCommittedTaskPath

      protected Path getCommittedTaskPath(int appAttemptId, TaskAttemptContext context)
      Compute the path where the output of a committed task is stored until the entire job is committed for a specific application attempt.
      Parameters:
      appAttemptId - the id of the application attempt to use
      context - the context of any task.
      Returns:
      the path where the output of a committed task is stored.
    • getWorkPath

      public Path getWorkPath() throws IOException
      Get the directory that the task should write results into.
      Specified by:
      getWorkPath in class PathOutputCommitter
      Returns:
      the work directory
      Throws:
      IOException
    • setupJob

      public void setupJob(JobContext context) throws IOException
      Create the temporary directory that is the root of all of the task work directories.
      Specified by:
      setupJob in class OutputCommitter
      Parameters:
      context - the job's context
      Throws:
      IOException - if temporary output could not be created
    • commitJob

      public void commitJob(JobContext context) throws IOException
      The job has completed, so do works in commitJobInternal(). Could retry on failure if using algorithm 2.
      Overrides:
      commitJob in class OutputCommitter
      Parameters:
      context - the job's context
      Throws:
      IOException
    • commitJobInternal

      @VisibleForTesting protected void commitJobInternal(JobContext context) throws IOException
      The job has completed, so do following commit job, include: Move all committed tasks to the final output dir (algorithm 1 only). Delete the temporary directory, including all of the work directories. Create a _SUCCESS file to make it as successful.
      Parameters:
      context - the job's context
      Throws:
      IOException
    • cleanupJob

      @Deprecated public void cleanupJob(JobContext context) throws IOException
      Deprecated.
      Description copied from class: OutputCommitter
      For cleaning up the job's output after job completion. This is called from the application master process for the entire job. This may be called multiple times.
      Overrides:
      cleanupJob in class OutputCommitter
      Parameters:
      context - Context of the job whose output is being written.
      Throws:
      IOException
    • abortJob

      public void abortJob(JobContext context, JobStatus.State state) throws IOException
      Delete the temporary directory, including all of the work directories.
      Overrides:
      abortJob in class OutputCommitter
      Parameters:
      context - the job's context
      state - final runstate of the job
      Throws:
      IOException
    • setupTask

      public void setupTask(TaskAttemptContext context) throws IOException
      No task setup required.
      Specified by:
      setupTask in class OutputCommitter
      Parameters:
      context - Context of the task whose output is being written.
      Throws:
      IOException
    • commitTask

      public void commitTask(TaskAttemptContext context) throws IOException
      Move the files from the work directory to the job output directory
      Specified by:
      commitTask in class OutputCommitter
      Parameters:
      context - the task context
      Throws:
      IOException - if commit is not successful.
    • commitTask

      @Private public void commitTask(TaskAttemptContext context, Path taskAttemptPath) throws IOException
      Throws:
      IOException
    • abortTask

      public void abortTask(TaskAttemptContext context) throws IOException
      Delete the work directory
      Specified by:
      abortTask in class OutputCommitter
      Throws:
      IOException
    • abortTask

      @Private public void abortTask(TaskAttemptContext context, Path taskAttemptPath) throws IOException
      Throws:
      IOException
    • needsTaskCommit

      public boolean needsTaskCommit(TaskAttemptContext context) throws IOException
      Did this task write any files in the work directory?
      Specified by:
      needsTaskCommit in class OutputCommitter
      Parameters:
      context - the task's context
      Returns:
      true/false
      Throws:
      IOException
    • needsTaskCommit

      @Private public boolean needsTaskCommit(TaskAttemptContext context, Path taskAttemptPath) throws IOException
      Throws:
      IOException
    • isRecoverySupported

      @Deprecated public boolean isRecoverySupported()
      Deprecated.
      Description copied from class: OutputCommitter
      Is task output recovery supported for restarting jobs? If task output recovery is supported, job restart can be done more efficiently.
      Overrides:
      isRecoverySupported in class OutputCommitter
      Returns:
      true if task output recovery is supported, false otherwise
      See Also:
    • isCommitJobRepeatable

      public boolean isCommitJobRepeatable(JobContext context) throws IOException
      Description copied from class: OutputCommitter
      Returns true if an in-progress job commit can be retried. If the MR AM is re-run then it will check this value to determine if it can retry an in-progress commit that was started by a previous version. Note that in rare scenarios, the previous AM version might still be running at that time, due to system anomalies. Hence if this method returns true then the retry commit operation should be able to run concurrently with the previous operation. If repeatable job commit is supported, job restart can tolerate previous AM failures during job commit. By default, it is not supported. Extended classes (like: FileOutputCommitter) should explicitly override it if provide support.
      Overrides:
      isCommitJobRepeatable in class OutputCommitter
      Parameters:
      context - Context of the job whose output is being written.
      Returns:
      true repeatable job commit is supported, false otherwise
      Throws:
      IOException
    • recoverTask

      public void recoverTask(TaskAttemptContext context) throws IOException
      Description copied from class: OutputCommitter
      Recover the task output. The retry-count for the job will be passed via the MRJobConfig.APPLICATION_ATTEMPT_ID key in JobContext.getConfiguration() for the OutputCommitter. This is called from the application master process, but it is called individually for each task. If an exception is thrown the task will be attempted again. This may be called multiple times for the same task. But from different application attempts.
      Overrides:
      recoverTask in class OutputCommitter
      Parameters:
      context - Context of the task whose output is being recovered
      Throws:
      IOException
    • toString

      public String toString()
      Overrides:
      toString in class PathOutputCommitter