Class BindingPathOutputCommitter
java.lang.Object
org.apache.hadoop.mapreduce.OutputCommitter
org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
org.apache.hadoop.mapreduce.lib.output.BindingPathOutputCommitter
- All Implemented Interfaces:
org.apache.hadoop.fs.statistics.IOStatisticsSource,StreamCapabilities
@Public
@Unstable
public class BindingPathOutputCommitter
extends PathOutputCommitter
implements org.apache.hadoop.fs.statistics.IOStatisticsSource, StreamCapabilities
This is a special committer which creates the factory for the committer and
runs off that. Why does it exist? So that you can explicitly instantiate
a committer by classname and yet still have the actual implementation
driven dynamically by the factory options and destination filesystem.
This simplifies integration
with existing code which takes the classname of a committer.
There's no factory for this, as that would lead to a loop.
All commit protocol methods and accessors are delegated to the
wrapped committer.
How to use:
-
In applications which take a classname of committer in
a configuration option, set it to the canonical name of this class
(see
NAME). When this class is instantiated, it will use the factory mechanism to locate the configured committer for the destination. - In code, explicitly create an instance of this committer through its constructor, then invoke commit lifecycle operations on it. The dynamically configured committer will be created in the constructor and have the lifecycle operations relayed to it.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.hadoop.fs.StreamCapabilities
StreamCapabilities.StreamCapability -
Field Summary
FieldsFields inherited from interface org.apache.hadoop.fs.StreamCapabilities
ABORTABLE_STREAM, DROPBEHIND, HFLUSH, HSYNC, IOSTATISTICS, IOSTATISTICS_CONTEXT, PREADBYTEBUFFER, READAHEAD, READBYTEBUFFER, UNBUFFER, VECTOREDIO, VECTOREDIO_BUFFERS_SLICED -
Constructor Summary
ConstructorsConstructorDescriptionBindingPathOutputCommitter(Path outputPath, TaskAttemptContext context) Instantiate. -
Method Summary
Modifier and TypeMethodDescriptionvoidabortJob(JobContext jobContext, JobStatus.State state) For aborting an unsuccessful job's output.voidabortTask(TaskAttemptContext taskContext) Discard the task output.voidcleanupJob(JobContext jobContext) For cleaning up the job's output after job completion.voidcommitJob(JobContext jobContext) For committing job's output after successful job completion.voidcommitTask(TaskAttemptContext taskContext) To promote the task's temporary output to final output location.Get the inner committer.Return a statistics instance.Get the final directory where work will be placed once the job is committed.Get the directory that the task should write results into.booleanhasCapability(String capability) Pass through if the inner committer supports StreamCapabilities.booleanPredicate: is there an output path?booleanisCommitJobRepeatable(JobContext jobContext) Returns true if an in-progress job commit can be retried.booleanIs task output recovery supported for restarting jobs?booleanisRecoverySupported(JobContext jobContext) Is task output recovery supported for restarting jobs?booleanneedsTaskCommit(TaskAttemptContext taskContext) Check whether task needs a commit.voidrecoverTask(TaskAttemptContext taskContext) Recover the task output.voidsetupJob(JobContext jobContext) For the framework to setup the job output during initialization.voidsetupTask(TaskAttemptContext taskContext) Sets up output for the task.toString()
-
Field Details
-
NAME
The classname for use in configurations.
-
-
Constructor Details
-
BindingPathOutputCommitter
Instantiate.- Parameters:
outputPath- output path (may be null)context- task context- Throws:
IOException- on any failure.
-
-
Method Details
-
getOutputPath
Description copied from class:PathOutputCommitterGet the final directory where work will be placed once the job is committed. This may be null, in which case, there is no output path to write data to.- Specified by:
getOutputPathin classPathOutputCommitter- Returns:
- the path where final output of the job should be placed.
-
getWorkPath
Description copied from class:PathOutputCommitterGet the directory that the task should write results into. Warning: there's no guarantee that this work path is on the same FS as the final output, or that it's visible across machines. May be null.- Specified by:
getWorkPathin classPathOutputCommitter- Returns:
- the work directory
- Throws:
IOException- IO problem
-
setupJob
Description copied from class:OutputCommitterFor the framework to setup the job output during initialization. This is called from the application master process for the entire job. This will be called multiple times, once per job attempt.- Specified by:
setupJobin classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.- Throws:
IOException- if temporary output could not be created
-
setupTask
Description copied from class:OutputCommitterSets up output for the task. This is called from each individual task's process that will output to HDFS, and it is called just for that task. This may be called multiple times for the same task, but for different task attempts.- Specified by:
setupTaskin classOutputCommitter- Parameters:
taskContext- Context of the task whose output is being written.- Throws:
IOException
-
needsTaskCommit
Description copied from class:OutputCommitterCheck whether task needs a commit. This is called from each individual task's process that will output to HDFS, and it is called just for that task.- Specified by:
needsTaskCommitin classOutputCommitter- Returns:
- true/false
- Throws:
IOException
-
commitTask
Description copied from class:OutputCommitterTo promote the task's temporary output to final output location. IfOutputCommitter.needsTaskCommit(TaskAttemptContext)returns true and this task is the task that the AM determines finished first, this method is called to commit an individual task's output. This is to mark that tasks output as complete, asOutputCommitter.commitJob(JobContext)will also be called later on if the entire job finished successfully. This is called from a task's process. This may be called multiple times for the same task, but different task attempts. It should be very rare for this to be called multiple times and requires odd networking failures to make this happen. In the future the Hadoop framework may eliminate this race.- Specified by:
commitTaskin classOutputCommitter- Parameters:
taskContext- Context of the task whose output is being written.- Throws:
IOException- if commit is not successful.
-
abortTask
Description copied from class:OutputCommitterDiscard the task output. This is called from a task's process to clean up a single task's output that can not yet been committed. This may be called multiple times for the same task, but for different task attempts.- Specified by:
abortTaskin classOutputCommitter- Throws:
IOException
-
cleanupJob
Description copied from class:OutputCommitterFor cleaning up the job's output after job completion. This is called from the application master process for the entire job. This may be called multiple times.- Overrides:
cleanupJobin classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.- Throws:
IOException
-
commitJob
Description copied from class:OutputCommitterFor committing job's output after successful job completion. Note that this is invoked for jobs with final runstate as SUCCESSFUL. This is called from the application master process for the entire job. This is guaranteed to only be called once. If it throws an exception the entire job will fail.- Overrides:
commitJobin classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.- Throws:
IOException
-
abortJob
Description copied from class:OutputCommitterFor aborting an unsuccessful job's output. Note that this is invoked for jobs with final runstate asJobStatus.State.FAILEDorJobStatus.State.KILLED. This is called from the application master process for the entire job. This may be called multiple times.- Overrides:
abortJobin classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.state- final runstate of the job- Throws:
IOException
-
isRecoverySupported
public boolean isRecoverySupported()Description copied from class:OutputCommitterIs task output recovery supported for restarting jobs? If task output recovery is supported, job restart can be done more efficiently.- Overrides:
isRecoverySupportedin classOutputCommitter- Returns:
trueif task output recovery is supported,falseotherwise- See Also:
-
isCommitJobRepeatable
Description copied from class:OutputCommitterReturns true if an in-progress job commit can be retried. If the MR AM is re-run then it will check this value to determine if it can retry an in-progress commit that was started by a previous version. Note that in rare scenarios, the previous AM version might still be running at that time, due to system anomalies. Hence if this method returns true then the retry commit operation should be able to run concurrently with the previous operation. If repeatable job commit is supported, job restart can tolerate previous AM failures during job commit. By default, it is not supported. Extended classes (like: FileOutputCommitter) should explicitly override it if provide support.- Overrides:
isCommitJobRepeatablein classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.- Returns:
truerepeatable job commit is supported,falseotherwise- Throws:
IOException
-
isRecoverySupported
Description copied from class:OutputCommitterIs task output recovery supported for restarting jobs? If task output recovery is supported, job restart can be done more efficiently.- Overrides:
isRecoverySupportedin classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.- Returns:
trueif task output recovery is supported,falseotherwise- Throws:
IOException- See Also:
-
recoverTask
Description copied from class:OutputCommitterRecover the task output. The retry-count for the job will be passed via theMRJobConfig.APPLICATION_ATTEMPT_IDkey inJobContext.getConfiguration()for theOutputCommitter. This is called from the application master process, but it is called individually for each task. If an exception is thrown the task will be attempted again. This may be called multiple times for the same task. But from different application attempts.- Overrides:
recoverTaskin classOutputCommitter- Parameters:
taskContext- Context of the task whose output is being recovered- Throws:
IOException
-
hasOutputPath
public boolean hasOutputPath()Description copied from class:PathOutputCommitterPredicate: is there an output path?- Overrides:
hasOutputPathin classPathOutputCommitter- Returns:
- true if we have an output path set, else false.
-
toString
- Overrides:
toStringin classPathOutputCommitter
-
getCommitter
Get the inner committer.- Returns:
- the bonded committer.
-
hasCapability
Pass through if the inner committer supports StreamCapabilities. Query the stream for a specific capability.- Specified by:
hasCapabilityin interfaceStreamCapabilities- Parameters:
capability- string to query the stream support for.- Returns:
- True if the stream supports capability.
-
getIOStatistics
Description copied from interface:org.apache.hadoop.fs.statistics.IOStatisticsSourceReturn a statistics instance.It is not a requirement that the same instance is returned every time.
IOStatisticsSource.If the object implementing this is Closeable, this method may return null if invoked on a closed object, even if it returns a valid instance when called earlier.
- Specified by:
getIOStatisticsin interfaceorg.apache.hadoop.fs.statistics.IOStatisticsSource- Returns:
- an IOStatistics instance or null
-