Class FileOutputCommitter
java.lang.Object
org.apache.hadoop.mapreduce.OutputCommitter
org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
- Direct Known Subclasses:
PartialFileOutputCommitter
An
OutputCommitter that commits files specified
in job output directory i.e. ${mapreduce.output.fileoutputformat.outputdir}.-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Stringstatic final intstatic final Stringstatic final booleanstatic final Stringstatic final booleanstatic final Stringstatic final intstatic final Stringstatic final booleanstatic final StringName of directory where pending data is placed.static final Stringstatic final Stringprotected static final StringDeprecated. -
Constructor Summary
ConstructorsConstructorDescriptionFileOutputCommitter(Path outputPath, JobContext context) Create a file output committerFileOutputCommitter(Path outputPath, TaskAttemptContext context) Create a file output committer -
Method Summary
Modifier and TypeMethodDescriptionvoidabortJob(JobContext context, JobStatus.State state) Delete the temporary directory, including all of the work directories.voidabortTask(TaskAttemptContext context) Delete the work directoryvoidabortTask(TaskAttemptContext context, Path taskAttemptPath) voidcleanupJob(JobContext context) Deprecated.voidcommitJob(JobContext context) The job has completed, so do works in commitJobInternal().protected voidcommitJobInternal(JobContext context) The job has completed, so do following commit job, include: Move all committed tasks to the final output dir (algorithm 1 only).voidcommitTask(TaskAttemptContext context) Move the files from the work directory to the job output directoryvoidcommitTask(TaskAttemptContext context, Path taskAttemptPath) protected PathgetCommittedTaskPath(int appAttemptId, TaskAttemptContext context) Compute the path where the output of a committed task is stored until the entire job is committed for a specific application attempt.getCommittedTaskPath(TaskAttemptContext context) Compute the path where the output of a committed task is stored until the entire job is committed.static PathgetCommittedTaskPath(TaskAttemptContext context, Path out) protected PathgetJobAttemptPath(int appAttemptId) Compute the path where the output of a given job attempt will be placed.getJobAttemptPath(JobContext context) Compute the path where the output of a given job attempt will be placed.static PathgetJobAttemptPath(JobContext context, Path out) Compute the path where the output of a given job attempt will be placed.Get the final directory where work will be placed once the job is committed.getTaskAttemptPath(TaskAttemptContext context) Compute the path where the output of a task attempt is stored until that task is committed.static PathgetTaskAttemptPath(TaskAttemptContext context, Path out) Compute the path where the output of a task attempt is stored until that task is committed.Get the directory that the task should write results into.booleanisCommitJobRepeatable(JobContext context) Returns true if an in-progress job commit can be retried.booleanDeprecated.booleanneedsTaskCommit(TaskAttemptContext context) Did this task write any files in the work directory?booleanneedsTaskCommit(TaskAttemptContext context, Path taskAttemptPath) voidrecoverTask(TaskAttemptContext context) Recover the task output.voidsetupJob(JobContext context) Create the temporary directory that is the root of all of the task work directories.voidsetupTask(TaskAttemptContext context) No task setup required.toString()Methods inherited from class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
hasOutputPathMethods inherited from class org.apache.hadoop.mapreduce.OutputCommitter
isRecoverySupported
-
Field Details
-
PENDING_DIR_NAME
Name of directory where pending data is placed. Data that has not been committed yet.- See Also:
-
TEMP_DIR_NAME
Deprecated.Temporary directory name The static variable to be compatible with M/R 1.x- See Also:
-
SUCCEEDED_FILE_NAME
- See Also:
-
SUCCESSFUL_JOB_OUTPUT_DIR_MARKER
- See Also:
-
FILEOUTPUTCOMMITTER_ALGORITHM_VERSION
- See Also:
-
FILEOUTPUTCOMMITTER_ALGORITHM_VERSION_DEFAULT
public static final int FILEOUTPUTCOMMITTER_ALGORITHM_VERSION_DEFAULT- See Also:
-
FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED
- See Also:
-
FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED_DEFAULT
public static final boolean FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED_DEFAULT- See Also:
-
FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED
- See Also:
-
FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED_DEFAULT
public static final boolean FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED_DEFAULT- See Also:
-
FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS
- See Also:
-
FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS_DEFAULT
public static final int FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS_DEFAULT- See Also:
-
FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED
- See Also:
-
FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED_DEFAULT
public static final boolean FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED_DEFAULT- See Also:
-
-
Constructor Details
-
FileOutputCommitter
Create a file output committer- Parameters:
outputPath- the job's output path, or null if you want the output committer to act as a noop.context- the task's context- Throws:
IOException
-
FileOutputCommitter
Create a file output committer- Parameters:
outputPath- the job's output path, or null if you want the output committer to act as a noop.context- the task's context- Throws:
IOException
-
-
Method Details
-
getOutputPath
Description copied from class:PathOutputCommitterGet the final directory where work will be placed once the job is committed. This may be null, in which case, there is no output path to write data to.- Specified by:
getOutputPathin classPathOutputCommitter- Returns:
- the path where final output of the job should be placed. This could also be considered the committed application attempt path.
-
getJobAttemptPath
Compute the path where the output of a given job attempt will be placed.- Parameters:
context- the context of the job. This is used to get the application attempt id.- Returns:
- the path to store job attempt data.
-
getJobAttemptPath
Compute the path where the output of a given job attempt will be placed.- Parameters:
context- the context of the job. This is used to get the application attempt id.out- the output path to place these in.- Returns:
- the path to store job attempt data.
-
getJobAttemptPath
Compute the path where the output of a given job attempt will be placed.- Parameters:
appAttemptId- the ID of the application attempt for this job.- Returns:
- the path to store job attempt data.
-
getTaskAttemptPath
Compute the path where the output of a task attempt is stored until that task is committed.- Parameters:
context- the context of the task attempt.- Returns:
- the path where a task attempt should be stored.
-
getTaskAttemptPath
Compute the path where the output of a task attempt is stored until that task is committed.- Parameters:
context- the context of the task attempt.out- The output path to put things in.- Returns:
- the path where a task attempt should be stored.
-
getCommittedTaskPath
Compute the path where the output of a committed task is stored until the entire job is committed.- Parameters:
context- the context of the task attempt- Returns:
- the path where the output of a committed task is stored until the entire job is committed.
-
getCommittedTaskPath
-
getCommittedTaskPath
Compute the path where the output of a committed task is stored until the entire job is committed for a specific application attempt.- Parameters:
appAttemptId- the id of the application attempt to usecontext- the context of any task.- Returns:
- the path where the output of a committed task is stored.
-
getWorkPath
Get the directory that the task should write results into.- Specified by:
getWorkPathin classPathOutputCommitter- Returns:
- the work directory
- Throws:
IOException
-
setupJob
Create the temporary directory that is the root of all of the task work directories.- Specified by:
setupJobin classOutputCommitter- Parameters:
context- the job's context- Throws:
IOException- if temporary output could not be created
-
commitJob
The job has completed, so do works in commitJobInternal(). Could retry on failure if using algorithm 2.- Overrides:
commitJobin classOutputCommitter- Parameters:
context- the job's context- Throws:
IOException
-
commitJobInternal
The job has completed, so do following commit job, include: Move all committed tasks to the final output dir (algorithm 1 only). Delete the temporary directory, including all of the work directories. Create a _SUCCESS file to make it as successful.- Parameters:
context- the job's context- Throws:
IOException
-
cleanupJob
Deprecated.Description copied from class:OutputCommitterFor cleaning up the job's output after job completion. This is called from the application master process for the entire job. This may be called multiple times.- Overrides:
cleanupJobin classOutputCommitter- Parameters:
context- Context of the job whose output is being written.- Throws:
IOException
-
abortJob
Delete the temporary directory, including all of the work directories.- Overrides:
abortJobin classOutputCommitter- Parameters:
context- the job's contextstate- final runstate of the job- Throws:
IOException
-
setupTask
No task setup required.- Specified by:
setupTaskin classOutputCommitter- Parameters:
context- Context of the task whose output is being written.- Throws:
IOException
-
commitTask
Move the files from the work directory to the job output directory- Specified by:
commitTaskin classOutputCommitter- Parameters:
context- the task context- Throws:
IOException- if commit is not successful.
-
commitTask
@Private public void commitTask(TaskAttemptContext context, Path taskAttemptPath) throws IOException - Throws:
IOException
-
abortTask
Delete the work directory- Specified by:
abortTaskin classOutputCommitter- Throws:
IOException
-
abortTask
- Throws:
IOException
-
needsTaskCommit
Did this task write any files in the work directory?- Specified by:
needsTaskCommitin classOutputCommitter- Parameters:
context- the task's context- Returns:
- true/false
- Throws:
IOException
-
needsTaskCommit
@Private public boolean needsTaskCommit(TaskAttemptContext context, Path taskAttemptPath) throws IOException - Throws:
IOException
-
isRecoverySupported
Deprecated.Description copied from class:OutputCommitterIs task output recovery supported for restarting jobs? If task output recovery is supported, job restart can be done more efficiently.- Overrides:
isRecoverySupportedin classOutputCommitter- Returns:
trueif task output recovery is supported,falseotherwise- See Also:
-
isCommitJobRepeatable
Description copied from class:OutputCommitterReturns true if an in-progress job commit can be retried. If the MR AM is re-run then it will check this value to determine if it can retry an in-progress commit that was started by a previous version. Note that in rare scenarios, the previous AM version might still be running at that time, due to system anomalies. Hence if this method returns true then the retry commit operation should be able to run concurrently with the previous operation. If repeatable job commit is supported, job restart can tolerate previous AM failures during job commit. By default, it is not supported. Extended classes (like: FileOutputCommitter) should explicitly override it if provide support.- Overrides:
isCommitJobRepeatablein classOutputCommitter- Parameters:
context- Context of the job whose output is being written.- Returns:
truerepeatable job commit is supported,falseotherwise- Throws:
IOException
-
recoverTask
Description copied from class:OutputCommitterRecover the task output. The retry-count for the job will be passed via theMRJobConfig.APPLICATION_ATTEMPT_IDkey inJobContext.getConfiguration()for theOutputCommitter. This is called from the application master process, but it is called individually for each task. If an exception is thrown the task will be attempted again. This may be called multiple times for the same task. But from different application attempts.- Overrides:
recoverTaskin classOutputCommitter- Parameters:
context- Context of the task whose output is being recovered- Throws:
IOException
-
toString
- Overrides:
toStringin classPathOutputCommitter
-