org.apache.hadoop.mapreduce.lib.output.BindingPathOutputCommitter

All Implemented Interfaces:: org.apache.hadoop.fs.statistics.IOStatisticsSource, StreamCapabilities

@Public @Unstable public class BindingPathOutputCommitter extends PathOutputCommitter implements org.apache.hadoop.fs.statistics.IOStatisticsSource, StreamCapabilities

This is a special committer which creates the factory for the committer and runs off that. Why does it exist? So that you can explicitly instantiate a committer by classname and yet still have the actual implementation driven dynamically by the factory options and destination filesystem. This simplifies integration with existing code which takes the classname of a committer. There's no factory for this, as that would lead to a loop. All commit protocol methods and accessors are delegated to the wrapped committer. How to use:

In applications which take a classname of committer in a configuration option, set it to the canonical name of this class (see NAME). When this class is instantiated, it will use the factory mechanism to locate the configured committer for the destination.
In code, explicitly create an instance of this committer through its constructor, then invoke commit lifecycle operations on it. The dynamically configured committer will be created in the constructor and have the lifecycle operations relayed to it.

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.hadoop.fs.StreamCapabilities
StreamCapabilities.StreamCapability
Field Summary

Fields

Modifier and Type

Field

Description

static final String

NAME

The classname for use in configurations.

Fields inherited from interface org.apache.hadoop.fs.StreamCapabilities
ABORTABLE_STREAM, DROPBEHIND, HFLUSH, HSYNC, IOSTATISTICS, IOSTATISTICS_CONTEXT, PREADBYTEBUFFER, READAHEAD, READBYTEBUFFER, UNBUFFER, VECTOREDIO, VECTOREDIO_BUFFERS_SLICED
Constructor Summary

Constructors

Constructor

Description

BindingPathOutputCommitter(Path outputPath, TaskAttemptContext context)

Instantiate.
Method Summary

Modifier and Type

Method

Description

void

abortJob(JobContext jobContext, JobStatus.State state)

For aborting an unsuccessful job's output.

void

abortTask(TaskAttemptContext taskContext)

Discard the task output.

void

cleanupJob(JobContext jobContext)

For cleaning up the job's output after job completion.

void

commitJob(JobContext jobContext)

For committing job's output after successful job completion.

void

commitTask(TaskAttemptContext taskContext)

To promote the task's temporary output to final output location.

PathOutputCommitter

getCommitter()

Get the inner committer.

IOStatistics

getIOStatistics()

Return a statistics instance.

Path

getOutputPath()

Get the final directory where work will be placed once the job is committed.

Path

getWorkPath()

Get the directory that the task should write results into.

boolean

hasCapability(String capability)

Pass through if the inner committer supports StreamCapabilities.

boolean

hasOutputPath()

Predicate: is there an output path?

boolean

isCommitJobRepeatable(JobContext jobContext)

Returns true if an in-progress job commit can be retried.

boolean

isRecoverySupported()

Is task output recovery supported for restarting jobs?

boolean

isRecoverySupported(JobContext jobContext)

Is task output recovery supported for restarting jobs?

boolean

needsTaskCommit(TaskAttemptContext taskContext)

Check whether task needs a commit.

void

recoverTask(TaskAttemptContext taskContext)

Recover the task output.

void

setupJob(JobContext jobContext)

For the framework to setup the job output during initialization.

void

setupTask(TaskAttemptContext taskContext)

Sets up output for the task.

String

toString()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- NAME
  
  public static final String NAME
  
  The classname for use in configurations.
Constructor Details
- BindingPathOutputCommitter
  
  public BindingPathOutputCommitter(Path outputPath, TaskAttemptContext context) throws IOException
  
  Instantiate.
  
  Parameters:
  
  outputPath - output path (may be null)
  
  context - task context
  
  Throws:
  
  IOException - on any failure.
Method Details
- getOutputPath
  
  public Path getOutputPath()
  
  Description copied from class: PathOutputCommitter
  
  Get the final directory where work will be placed once the job is committed. This may be null, in which case, there is no output path to write data to.
  
  Specified by:
  
  getOutputPath in class PathOutputCommitter
  
  Returns:
  
  the path where final output of the job should be placed.
- getWorkPath
  
  public Path getWorkPath() throws IOException
  
  Description copied from class: PathOutputCommitter
  
  Get the directory that the task should write results into. Warning: there's no guarantee that this work path is on the same FS as the final output, or that it's visible across machines. May be null.
  
  Specified by:
  
  getWorkPath in class PathOutputCommitter
  
  Returns:
  
  the work directory
  
  Throws:
  
  IOException - IO problem
- setupJob
  
  public void setupJob(JobContext jobContext) throws IOException
  
  Description copied from class: OutputCommitter
  
  For the framework to setup the job output during initialization. This is called from the application master process for the entire job. This will be called multiple times, once per job attempt.
  
  Specified by:
  
  setupJob in class OutputCommitter
  
  Parameters:
  
  jobContext - Context of the job whose output is being written.
  
  Throws:
  
  IOException - if temporary output could not be created
- setupTask
  
  public void setupTask(TaskAttemptContext taskContext) throws IOException
  
  Description copied from class: OutputCommitter
  
  Sets up output for the task. This is called from each individual task's process that will output to HDFS, and it is called just for that task. This may be called multiple times for the same task, but for different task attempts.
  
  Specified by:
  
  setupTask in class OutputCommitter
  
  Parameters:
  
  taskContext - Context of the task whose output is being written.
  
  Throws:
  
  IOException
- needsTaskCommit
  
  public boolean needsTaskCommit(TaskAttemptContext taskContext) throws IOException
  
  Description copied from class: OutputCommitter
  
  Check whether task needs a commit. This is called from each individual task's process that will output to HDFS, and it is called just for that task.
  
  Specified by:
  
  needsTaskCommit in class OutputCommitter
  
  Returns:
  
  true/false
  
  Throws:
  
  IOException
- commitTask
  
  public void commitTask(TaskAttemptContext taskContext) throws IOException
  
  Description copied from class: OutputCommitter
  
  To promote the task's temporary output to final output location. If OutputCommitter.needsTaskCommit(TaskAttemptContext) returns true and this task is the task that the AM determines finished first, this method is called to commit an individual task's output. This is to mark that tasks output as complete, as OutputCommitter.commitJob(JobContext) will also be called later on if the entire job finished successfully. This is called from a task's process. This may be called multiple times for the same task, but different task attempts. It should be very rare for this to be called multiple times and requires odd networking failures to make this happen. In the future the Hadoop framework may eliminate this race.
  
  Specified by:
  
  commitTask in class OutputCommitter
  
  Parameters:
  
  taskContext - Context of the task whose output is being written.
  
  Throws:
  
  IOException - if commit is not successful.
- abortTask
  
  public void abortTask(TaskAttemptContext taskContext) throws IOException
  
  Description copied from class: OutputCommitter
  
  Discard the task output. This is called from a task's process to clean up a single task's output that can not yet been committed. This may be called multiple times for the same task, but for different task attempts.
  
  Specified by:
  
  abortTask in class OutputCommitter
  
  Throws:
  
  IOException
- cleanupJob
  
  public void cleanupJob(JobContext jobContext) throws IOException
  
  Description copied from class: OutputCommitter
  
  For cleaning up the job's output after job completion. This is called from the application master process for the entire job. This may be called multiple times.
  
  Overrides:
  
  cleanupJob in class OutputCommitter
  
  Parameters:
  
  jobContext - Context of the job whose output is being written.
  
  Throws:
  
  IOException
- commitJob
  
  public void commitJob(JobContext jobContext) throws IOException
  
  Description copied from class: OutputCommitter
  
  For committing job's output after successful job completion. Note that this is invoked for jobs with final runstate as SUCCESSFUL. This is called from the application master process for the entire job. This is guaranteed to only be called once. If it throws an exception the entire job will fail.
  
  Overrides:
  
  commitJob in class OutputCommitter
  
  Parameters:
  
  jobContext - Context of the job whose output is being written.
  
  Throws:
  
  IOException
- abortJob
  
  public void abortJob(JobContext jobContext, JobStatus.State state) throws IOException
  
  Description copied from class: OutputCommitter
  
  For aborting an unsuccessful job's output. Note that this is invoked for jobs with final runstate as JobStatus.State.FAILED or JobStatus.State.KILLED. This is called from the application master process for the entire job. This may be called multiple times.
  
  Overrides:
  
  abortJob in class OutputCommitter
  
  Parameters:
  
  jobContext - Context of the job whose output is being written.
  
  state - final runstate of the job
  
  Throws:
  
  IOException
- isRecoverySupported
  
  public boolean isRecoverySupported()
  
  Description copied from class: OutputCommitter
  
  Is task output recovery supported for restarting jobs? If task output recovery is supported, job restart can be done more efficiently.
  Overrides:
  
  isRecoverySupported in class OutputCommitter
  
  Returns:
  
  true if task output recovery is supported, false otherwise
  
  See Also:
  
  OutputCommitter.recoverTask(TaskAttemptContext)
- isCommitJobRepeatable
  
  public boolean isCommitJobRepeatable(JobContext jobContext) throws IOException
  
  Description copied from class: OutputCommitter
  
  Returns true if an in-progress job commit can be retried. If the MR AM is re-run then it will check this value to determine if it can retry an in-progress commit that was started by a previous version. Note that in rare scenarios, the previous AM version might still be running at that time, due to system anomalies. Hence if this method returns true then the retry commit operation should be able to run concurrently with the previous operation. If repeatable job commit is supported, job restart can tolerate previous AM failures during job commit. By default, it is not supported. Extended classes (like: FileOutputCommitter) should explicitly override it if provide support.
  
  Overrides:
  
  isCommitJobRepeatable in class OutputCommitter
  
  Parameters:
  
  jobContext - Context of the job whose output is being written.
  
  Returns:
  
  true repeatable job commit is supported, false otherwise
  
  Throws:
  
  IOException
- isRecoverySupported
  
  public boolean isRecoverySupported(JobContext jobContext) throws IOException
  
  Description copied from class: OutputCommitter
  
  Is task output recovery supported for restarting jobs? If task output recovery is supported, job restart can be done more efficiently.
  Overrides:
  
  isRecoverySupported in class OutputCommitter
  
  Parameters:
  
  jobContext - Context of the job whose output is being written.
  
  Returns:
  
  true if task output recovery is supported, false otherwise
  
  Throws:
  
  IOException
  
  See Also:
  
  OutputCommitter.recoverTask(TaskAttemptContext)
- recoverTask
  
  public void recoverTask(TaskAttemptContext taskContext) throws IOException
  
  Description copied from class: OutputCommitter
  
  Recover the task output. The retry-count for the job will be passed via the MRJobConfig.APPLICATION_ATTEMPT_ID key in JobContext.getConfiguration() for the OutputCommitter. This is called from the application master process, but it is called individually for each task. If an exception is thrown the task will be attempted again. This may be called multiple times for the same task. But from different application attempts.
  
  Overrides:
  
  recoverTask in class OutputCommitter
  
  Parameters:
  
  taskContext - Context of the task whose output is being recovered
  
  Throws:
  
  IOException
- hasOutputPath
  
  public boolean hasOutputPath()
  
  Description copied from class: PathOutputCommitter
  
  Predicate: is there an output path?
  
  Overrides:
  
  hasOutputPath in class PathOutputCommitter
  
  Returns:
  
  true if we have an output path set, else false.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class PathOutputCommitter
- getCommitter
  
  public PathOutputCommitter getCommitter()
  
  Get the inner committer.
  
  Returns:
  
  the bonded committer.
- hasCapability
  
  public boolean hasCapability(String capability)
  
  Pass through if the inner committer supports StreamCapabilities. Query the stream for a specific capability.
  
  Specified by:
  
  hasCapability in interface StreamCapabilities
  
  Parameters:
  
  capability - string to query the stream support for.
  
  Returns:
  
  True if the stream supports capability.
- getIOStatistics
  
  public IOStatistics getIOStatistics()
  
  Description copied from interface: org.apache.hadoop.fs.statistics.IOStatisticsSource
  
  Return a statistics instance.
  It is not a requirement that the same instance is returned every time. IOStatisticsSource.
  If the object implementing this is Closeable, this method may return null if invoked on a closed object, even if it returns a valid instance when called earlier.
  
  Specified by:
  
  getIOStatistics in interface org.apache.hadoop.fs.statistics.IOStatisticsSource
  
  Returns:
  
  an IOStatistics instance or null

Class BindingPathOutputCommitter

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.hadoop.fs.StreamCapabilities

Field Summary

Fields inherited from interface org.apache.hadoop.fs.StreamCapabilities

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

NAME

Constructor Details

BindingPathOutputCommitter

Method Details

getOutputPath

getWorkPath

setupJob

setupTask

needsTaskCommit

commitTask

abortTask

cleanupJob

commitJob

abortJob

isRecoverySupported

isCommitJobRepeatable

isRecoverySupported

recoverTask

hasOutputPath

toString

getCommitter

hasCapability

getIOStatistics