Class JobConf
JobConf is the primary interface for a user to describe a
map-reduce job to the Hadoop framework for execution. The framework tries to
faithfully execute the job as-is described by JobConf, however:
- Some configuration parameters might have been marked as final by administrators and hence cannot be altered.
-
While some job parameters are straight-forward to set
(e.g.
setNumReduceTasks(int)), some parameters interact subtly with the rest of the framework and/or job-configuration and is relatively more complex for the user to control finely (e.g.setNumMapTasks(int)).
JobConf typically specifies the Mapper, combiner
(if any), Partitioner, Reducer, InputFormat and
OutputFormat implementations to be used etc.
Optionally JobConf is used to specify other advanced facets
of the job such as Comparators to be used, files to be put in
the DistributedCache, whether or not intermediate and/or job outputs
are to be compressed (and how), debugability via user-provided scripts
( setMapDebugScript(String)/setReduceDebugScript(String)),
for doing post-processing on task logs, task's stdout, stderr, syslog.
and etc.
Here is an example on how to configure a job via JobConf:
// Create a new JobConf
JobConf job = new JobConf(new Configuration(), MyJob.class);
// Specify various job-specific parameters
job.setJobName("myjob");
FileInputFormat.setInputPaths(job, new Path("in"));
FileOutputFormat.setOutputPath(job, new Path("out"));
job.setMapperClass(MyJob.MyMapper.class);
job.setCombinerClass(MyJob.MyReducer.class);
job.setReducerClass(MyJob.MyReducer.class);
job.setInputFormat(SequenceFileInputFormat.class);
job.setOutputFormat(SequenceFileOutputFormat.class);
- See Also:
-
JobClientClusterStatusToolDistributedCache
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.conf.Configuration
org.apache.hadoop.conf.Configuration.DeprecationDelta, org.apache.hadoop.conf.Configuration.IntegerRanges -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringDefault logging level for map/reduce tasks.static final Stringstatic final booleanDeprecated.static final StringName of the queue to which jobs will be submitted, if no queue name is mentioned.static final longDeprecated.static final StringDeprecated.static final StringDeprecated.static final StringProperty name for the configuration property mapreduce.cluster.local.dirstatic final StringConfiguration key to set the environment of the child map tasks.static final StringConfiguration key to set the java command line options for the map tasks.static final StringConfiguration key to set the logging level for the map task.static final StringDeprecated.Configuration key to set the maximum virtual memory available to the map tasks (in kilo-bytes).static final StringConfiguration key to set the environment of the child reduce tasks.static final StringConfiguration key to set the java command line options for the reduce tasks.static final StringConfiguration key to set the logging level for the reduce task.static final StringDeprecated.Configuration key to set the maximum virtual memory available to the reduce tasks (in kilo-bytes).static final StringDeprecated.static final StringDeprecated.static final StringDeprecated.static final StringDeprecated.static final StringDeprecated.UseMAPREDUCE_JOB_MAP_MEMORY_MB_PROPERTYandMAPREDUCE_JOB_REDUCE_MEMORY_MB_PROPERTYstatic final StringDeprecated.Configuration key to set the maximum virtual memory available to the child map and reduce tasks (in kilo-bytes).static final StringDeprecated.static final PatternPattern for the default unpacking behavior for job jarsstatic final StringDeprecated.static final StringDeprecated.static final StringDeprecated.static final StringDeprecated.static final StringDeprecated.static final StringDeprecated.static final StringDeprecated. -
Constructor Summary
ConstructorsConstructorDescriptionJobConf()Construct a map/reduce job configuration.JobConf(boolean loadDefaults) A new map/reduce configuration where the behavior of reading from the default resources can be turned off.Construct a map/reduce job configuration.Construct a map/reduce configuration.JobConf(Configuration conf) Construct a map/reduce job configuration.JobConf(Configuration conf, Class exampleClass) Construct a map/reduce job configuration.Construct a map/reduce configuration. -
Method Summary
Modifier and TypeMethodDescriptionvoidDeprecated.voiddeleteLocalFiles(String subdir) static StringfindContainingJar(Class my_class) Find a jar that contains a class of the same name, if any.Get the user-defined combiner class used to combine map-outputs before being sent to the reducers.Get the user definedWritableComparablecomparator for grouping keys of inputs to the combiner.booleanAre the outputs of the maps be compressed?Get credentials for the job.Get theInputFormatimplementation for the map-reduce job, defaults toTextInputFormatif not specified explicity.getJar()Get the user jar for the map-reduce job.Get the pattern for jar contents to unpack on the tasktrackerReturns the class to be invoked in order to send a notification after the job has completed (success/failure).Get the uri to be invoked in-order to send a notification after the job has completed (success/failure).Get job-specific shared directory for use as scratch spaceGet the user-specified job name.Get theJobPriorityfor this job.intGet the priority for this job.booleanShould the temporary files for failed tasks be kept?Get the regular expression that is matched against the task names to see if we need to keep the files.Get theKeyFieldBasedComparatoroptionsGet theKeyFieldBasedPartitioneroptionsString[]getLocalPath(String pathString) Constructs a local file name.Get the map task's debug script.Class<? extends CompressionCodec>getMapOutputCompressorClass(Class<? extends CompressionCodec> defaultValue) Get theCompressionCodecfor compressing the map outputs.Class<?>Get the key class for the map output data.Class<?>Get the value class for the map output data.Get theMapperclass for the job.Class<? extends MapRunnable>Get theMapRunnableclass for the job.booleanShould speculative execution be used for this job for map tasks?intGet the configured number of maximum attempts that will be made to run a map task, as specified by themapreduce.map.maxattemptsproperty.intGet the maximum percentage of map tasks that can fail without the job being aborted.longDeprecated.this variable is deprecated and nolonger in use.intGet the configured number of maximum attempts that will be made to run a reduce task, as specified by themapreduce.reduce.maxattemptsproperty.intGet the maximum percentage of reduce tasks that can fail without the job being aborted.intExpert: Get the maximum no. of failures of a given job per tasktracker.longDeprecated.longGet memory required to run a map task of the job, in MB.longGet memory required to run a reduce task of the job, in MB.intgetMemoryRequired(TaskType taskType) intGet the configured number of map tasks for this job.intGet the configured number of reduce tasks for this job.intGet the number of tasks that a spawned JVM should executeGet theOutputCommitterimplementation for the map-reduce job, defaults toFileOutputCommitterif not specified explicitly.Get theOutputFormatimplementation for the map-reduce job, defaults toTextOutputFormatif not specified explicity.Class<?>Get the key class for the job output data.Get theRawComparatorcomparator used to compare keys.Class<?>Get the value class for job outputs.Get the user definedWritableComparablecomparator for grouping keys of inputs to the reduce.Class<? extends Partitioner>booleanGet whether the task profiling is enabled.Get the profiler configuration arguments.org.apache.hadoop.conf.Configuration.IntegerRangesgetProfileTaskRange(boolean isMap) Get the range of maps or reduces to profile.Return the name of the queue to which this job is submitted.Get the reduce task's debug ScriptGet theReducerclass for the job.booleanShould speculative execution be used for this job for reduce tasks?Deprecated.booleanShould speculative execution be used for this job?getTaskJavaOpts(TaskType taskType) booleanShould the framework use the new context-object code for running the mapper?booleanShould the framework use the new context-object code for running the reducer?getUser()Get the reported username for this job.Get the current working directory for the default file system.static voidstatic longnormalizeMemoryConfigValue(long val) Normalize the negative values in configurationstatic intparseMaximumHeapSizeMB(String javaOpts) Parse the Maximum heap size from the java opts as specified by the -Xmx option Format: -Xmx<size>[g|G|m|M|k|K]voidsetCombinerClass(Class<? extends Reducer> theClass) Set the user-defined combiner class used to combine map-outputs before being sent to the reducers.voidsetCombinerKeyGroupingComparator(Class<? extends RawComparator> theClass) Set the user definedRawComparatorcomparator for grouping keys in the input to the combiner.voidsetCompressMapOutput(boolean compress) Should the map outputs be compressed before transfer?voidsetCredentials(Credentials credentials) voidsetInputFormat(Class<? extends InputFormat> theClass) Set theInputFormatimplementation for the map-reduce job.voidSet the user jar for the map-reduce job.voidsetJarByClass(Class cls) Set the job's jar file by finding an example class location.voidsetJobEndNotificationCustomNotifierClass(String customNotifierClassName) Sets the class to be invoked in order to send a notification after the job has completed (success/failure).voidSet the uri to be invoked in-order to send a notification after the job has completed (success/failure).voidsetJobName(String name) Set the user-specified job name.voidsetJobPriority(JobPriority prio) SetJobPriorityfor this job.voidsetJobPriorityAsInteger(int prio) SetJobPriorityfor this job.voidsetKeepFailedTaskFiles(boolean keep) Set whether the framework should keep the intermediate files for failed tasks.voidsetKeepTaskFilesPattern(String pattern) Set a regular expression for task names that should be kept.voidsetKeyFieldComparatorOptions(String keySpec) Set theKeyFieldBasedComparatoroptions used to compare keys.voidsetKeyFieldPartitionerOptions(String keySpec) Set theKeyFieldBasedPartitioneroptions used forPartitionervoidsetMapDebugScript(String mDbgScript) Set the debug script to run when the map tasks fail.voidsetMapOutputCompressorClass(Class<? extends CompressionCodec> codecClass) Set the given class as theCompressionCodecfor the map outputs.voidsetMapOutputKeyClass(Class<?> theClass) Set the key class for the map output data.voidsetMapOutputValueClass(Class<?> theClass) Set the value class for the map output data.voidsetMapperClass(Class<? extends Mapper> theClass) Set theMapperclass for the job.voidsetMapRunnerClass(Class<? extends MapRunnable> theClass) Expert: Set theMapRunnableclass for the job.voidsetMapSpeculativeExecution(boolean speculativeExecution) Turn speculative execution on or off for this job for map tasks.voidsetMaxMapAttempts(int n) Expert: Set the number of maximum attempts that will be made to run a map task.voidsetMaxMapTaskFailuresPercent(int percent) Expert: Set the maximum percentage of map tasks that can fail without the job being aborted.voidsetMaxPhysicalMemoryForTask(long mem) Deprecated.voidsetMaxReduceAttempts(int n) Expert: Set the number of maximum attempts that will be made to run a reduce task.voidsetMaxReduceTaskFailuresPercent(int percent) Set the maximum percentage of reduce tasks that can fail without the job being aborted.voidsetMaxTaskFailuresPerTracker(int noFailures) Set the maximum no. of failures of a given job per tasktracker.voidsetMaxVirtualMemoryForTask(long vmem) Deprecated.voidsetMemoryForMapTask(long mem) voidsetMemoryForReduceTask(long mem) voidsetNumMapTasks(int n) Set the number of map tasks for this job.voidsetNumReduceTasks(int n) Set the requisite number of reduce tasks for this job.voidsetNumTasksToExecutePerJvm(int numTasks) Sets the number of tasks that a spawned task JVM should run before it exitsvoidsetOutputCommitter(Class<? extends OutputCommitter> theClass) Set theOutputCommitterimplementation for the map-reduce job.voidsetOutputFormat(Class<? extends OutputFormat> theClass) Set theOutputFormatimplementation for the map-reduce job.voidsetOutputKeyClass(Class<?> theClass) Set the key class for the job output data.voidsetOutputKeyComparatorClass(Class<? extends RawComparator> theClass) Set theRawComparatorcomparator used to compare keys.voidsetOutputValueClass(Class<?> theClass) Set the value class for job outputs.voidsetOutputValueGroupingComparator(Class<? extends RawComparator> theClass) Set the user definedRawComparatorcomparator for grouping keys in the input to the reduce.voidsetPartitionerClass(Class<? extends Partitioner> theClass) voidsetProfileEnabled(boolean newValue) Set whether the system should collect profiler information for some of the tasks in this job?voidsetProfileParams(String value) Set the profiler configuration arguments.voidsetProfileTaskRange(boolean isMap, String newValue) Set the ranges of maps or reduces to profile. setProfileEnabled(true) must also be called.voidsetQueueName(String queueName) Set the name of the queue to which this job should be submitted.voidsetReduceDebugScript(String rDbgScript) Set the debug script to run when the reduce tasks fail.voidsetReducerClass(Class<? extends Reducer> theClass) Set theReducerclass for the job.voidsetReduceSpeculativeExecution(boolean speculativeExecution) Turn speculative execution on or off for this job for reduce tasks.voidsetSessionId(String sessionId) Deprecated.voidsetSpeculativeExecution(boolean speculativeExecution) Turn speculative execution on or off for this job.voidsetUseNewMapper(boolean flag) Set whether the framework should use the new api for the mapper.voidsetUseNewReducer(boolean flag) Set whether the framework should use the new api for the reducer.voidSet the reported username for this job.voidsetWorkingDirectory(Path dir) Set the current working directory for the default file system.Methods inherited from class org.apache.hadoop.conf.Configuration
addDefaultResource, addDeprecation, addDeprecation, addDeprecation, addDeprecation, addDeprecations, addResource, addResource, addResource, addResource, addResource, addResource, addResource, addResource, addResource, addResource, addResource, addTags, clear, dumpConfiguration, dumpConfiguration, dumpDeprecatedKeys, get, get, getAllPropertiesByTag, getAllPropertiesByTags, getBoolean, getClass, getClass, getClassByName, getClassByNameOrNull, getClasses, getClassLoader, getConfResourceAsInputStream, getConfResourceAsReader, getDouble, getEnum, getEnumSet, getFile, getFinalParameters, getFloat, getInstances, getInt, getInts, getLocalPath, getLong, getLongBytes, getPassword, getPasswordFromConfig, getPasswordFromCredentialProviders, getPattern, getPropertySources, getProps, getPropsWithPrefix, getRange, getRaw, getResource, getSocketAddr, getSocketAddr, getStorageSize, getStorageSize, getStringCollection, getStrings, getStrings, getTimeDuration, getTimeDuration, getTimeDuration, getTimeDuration, getTimeDurationHelper, getTimeDurations, getTrimmed, getTrimmed, getTrimmedStringCollection, getTrimmedStrings, getTrimmedStrings, getValByRegex, hasWarnedDeprecation, isDeprecated, isPropertyTag, iterator, onlyKeyExists, readFields, reloadConfiguration, reloadExistingConfigurations, set, set, setAllowNullValueProperties, setBoolean, setBooleanIfUnset, setClass, setClassLoader, setDeprecatedProperties, setDouble, setEnum, setFloat, setIfUnset, setInt, setLong, setPattern, setQuietMode, setRestrictSystemProperties, setRestrictSystemPropertiesDefault, setRestrictSystemProps, setSocketAddr, setStorageSize, setStrings, setTimeDuration, size, substituteCommonVariables, toString, unset, updateConnectAddr, updateConnectAddr, write, writeXml, writeXml, writeXml, writeXmlMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
MAPRED_TASK_MAXVMEM_PROPERTY
Deprecated.UseMAPREDUCE_JOB_MAP_MEMORY_MB_PROPERTYandMAPREDUCE_JOB_REDUCE_MEMORY_MB_PROPERTY- See Also:
-
UPPER_LIMIT_ON_TASK_VMEM_PROPERTY
Deprecated.- See Also:
-
MAPRED_TASK_DEFAULT_MAXVMEM_PROPERTY
Deprecated.- See Also:
-
MAPRED_TASK_MAXPMEM_PROPERTY
Deprecated.- See Also:
-
DISABLED_MEMORY_LIMIT
Deprecated.A value which if set for memory related configuration options, indicates that the options are turned off. Deprecated because it makes no sense in the context of MR2.- See Also:
-
MAPRED_LOCAL_DIR_PROPERTY
Property name for the configuration property mapreduce.cluster.local.dir- See Also:
-
DEFAULT_QUEUE_NAME
Name of the queue to which jobs will be submitted, if no queue name is mentioned.- See Also:
-
MAPRED_JOB_MAP_MEMORY_MB_PROPERTY
Deprecated.The variable is kept for M/R 1.x applications, while M/R 2.x applications should useMAPREDUCE_JOB_MAP_MEMORY_MB_PROPERTY- See Also:
-
MAPRED_JOB_REDUCE_MEMORY_MB_PROPERTY
Deprecated.The variable is kept for M/R 1.x applications, while M/R 2.x applications should useMAPREDUCE_JOB_REDUCE_MEMORY_MB_PROPERTY- See Also:
-
UNPACK_JAR_PATTERN_DEFAULT
Pattern for the default unpacking behavior for job jars -
MAPRED_TASK_JAVA_OPTS
Deprecated.Configuration key to set the java command line options for the child map and reduce tasks. Java opts for the task tracker child processes. The following symbol, if present, will be interpolated: @taskid@. It is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc The configuration variableMAPRED_TASK_ENVcan be used to pass other environment variables to the child processes.- See Also:
-
MAPRED_MAP_TASK_JAVA_OPTS
Configuration key to set the java command line options for the map tasks. Java opts for the task tracker child map processes. The following symbol, if present, will be interpolated: @taskid@. It is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc The configuration variableMAPRED_MAP_TASK_ENVcan be used to pass other environment variables to the map processes.- See Also:
-
MAPRED_REDUCE_TASK_JAVA_OPTS
Configuration key to set the java command line options for the reduce tasks. Java opts for the task tracker child reduce processes. The following symbol, if present, will be interpolated: @taskid@. It is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc The configuration variableMAPRED_REDUCE_TASK_ENVcan be used to pass process environment variables to the reduce processes.- See Also:
-
DEFAULT_MAPRED_TASK_JAVA_OPTS
- See Also:
-
MAPRED_TASK_ULIMIT
Deprecated.Configuration key to set the maximum virtual memory available to the child map and reduce tasks (in kilo-bytes). This has been deprecated and will no longer have any effect.- See Also:
-
MAPRED_MAP_TASK_ULIMIT
Deprecated.Configuration key to set the maximum virtual memory available to the map tasks (in kilo-bytes). This has been deprecated and will no longer have any effect.- See Also:
-
MAPRED_REDUCE_TASK_ULIMIT
Deprecated.Configuration key to set the maximum virtual memory available to the reduce tasks (in kilo-bytes). This has been deprecated and will no longer have any effect.- See Also:
-
MAPRED_TASK_ENV
Deprecated.Configuration key to set the environment of the child map/reduce tasks. The format of the value isk1=v1,k2=v2. Further it can reference existing environment variables via$keyon Linux or%key%on Windows. Example:- A=foo - This will set the env variable A to foo.
- See Also:
-
MAPRED_MAP_TASK_ENV
Configuration key to set the environment of the child map tasks. The format of the value isk1=v1,k2=v2. Further it can reference existing environment variables via$keyon Linux or%key%on Windows. Example:- A=foo - This will set the env variable A to foo.
.VARNAMEto this configuration key, where VARNAME is the name of the environment variable. Example:- mapreduce.map.env.VARNAME=value
- See Also:
-
MAPRED_REDUCE_TASK_ENV
Configuration key to set the environment of the child reduce tasks. The format of the value isk1=v1,k2=v2. Further it can reference existing environment variables via$keyon Linux or%key%on Windows. Example:- A=foo - This will set the env variable A to foo.
.VARNAMEto this configuration key, where VARNAME is the name of the environment variable. Example:- mapreduce.reduce.env.VARNAME=value
- See Also:
-
MAPRED_MAP_TASK_LOG_LEVEL
Configuration key to set the logging level for the map task. The allowed logging levels are: OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE and ALL.- See Also:
-
MAPRED_REDUCE_TASK_LOG_LEVEL
Configuration key to set the logging level for the reduce task. The allowed logging levels are: OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE and ALL.- See Also:
-
DEFAULT_LOG_LEVEL
Default logging level for map/reduce tasks.- See Also:
-
WORKFLOW_ID
Deprecated.The variable is kept for M/R 1.x applications, M/R 2.x applications should useMRJobConfig.WORKFLOW_IDinstead- See Also:
-
WORKFLOW_NAME
Deprecated.The variable is kept for M/R 1.x applications, M/R 2.x applications should useMRJobConfig.WORKFLOW_NAMEinstead- See Also:
-
WORKFLOW_NODE_NAME
Deprecated.The variable is kept for M/R 1.x applications, M/R 2.x applications should useMRJobConfig.WORKFLOW_NODE_NAMEinstead- See Also:
-
WORKFLOW_ADJACENCY_PREFIX_STRING
Deprecated.The variable is kept for M/R 1.x applications, M/R 2.x applications should useMRJobConfig.WORKFLOW_ADJACENCY_PREFIX_STRINGinstead- See Also:
-
WORKFLOW_ADJACENCY_PREFIX_PATTERN
Deprecated.The variable is kept for M/R 1.x applications, M/R 2.x applications should useMRJobConfig.WORKFLOW_ADJACENCY_PREFIX_PATTERNinstead- See Also:
-
WORKFLOW_TAGS
Deprecated.The variable is kept for M/R 1.x applications, M/R 2.x applications should useMRJobConfig.WORKFLOW_TAGSinstead- See Also:
-
MAPREDUCE_RECOVER_JOB
Deprecated.The variable is kept for M/R 1.x applications, M/R 2.x applications should not use it- See Also:
-
DEFAULT_MAPREDUCE_RECOVER_JOB
Deprecated.The variable is kept for M/R 1.x applications, M/R 2.x applications should not use it- See Also:
-
-
Constructor Details
-
JobConf
public JobConf()Construct a map/reduce job configuration. -
JobConf
Construct a map/reduce job configuration.- Parameters:
exampleClass- a class whose containing jar is used as the job's jar.
-
JobConf
Construct a map/reduce job configuration.- Parameters:
conf- a Configuration whose settings will be inherited.
-
JobConf
Construct a map/reduce job configuration.- Parameters:
conf- a Configuration whose settings will be inherited.exampleClass- a class whose containing jar is used as the job's jar.
-
JobConf
Construct a map/reduce configuration.- Parameters:
config- a Configuration-format XML job description file.
-
JobConf
Construct a map/reduce configuration.- Parameters:
config- a Configuration-format XML job description file.
-
JobConf
public JobConf(boolean loadDefaults) A new map/reduce configuration where the behavior of reading from the default resources can be turned off.If the parameter
loadDefaultsis false, the new instance will not load resources from the default files.- Parameters:
loadDefaults- specifies whether to load from the default files
-
-
Method Details
-
getCredentials
Get credentials for the job.- Returns:
- credentials for the job
-
setCredentials
-
getJar
Get the user jar for the map-reduce job.- Returns:
- the user jar for the map-reduce job.
-
setJar
Set the user jar for the map-reduce job.- Parameters:
jar- the user jar for the map-reduce job.
-
getJarUnpackPattern
Get the pattern for jar contents to unpack on the tasktracker -
setJarByClass
Set the job's jar file by finding an example class location.- Parameters:
cls- the example class.
-
getLocalDirs
- Throws:
IOException
-
deleteLocalFiles
Deprecated.Use MRAsyncDiskService.moveAndDeleteAllVolumes instead.- Throws:
IOException
-
deleteLocalFiles
- Throws:
IOException
-
getLocalPath
Constructs a local file name. Files are distributed among configured local directories.- Throws:
IOException
-
getUser
Get the reported username for this job.- Returns:
- the username
-
setUser
Set the reported username for this job.- Parameters:
user- the username for this job.
-
setKeepFailedTaskFiles
public void setKeepFailedTaskFiles(boolean keep) Set whether the framework should keep the intermediate files for failed tasks.- Parameters:
keep-trueif framework should keep the intermediate files for failed tasks,falseotherwise.
-
getKeepFailedTaskFiles
public boolean getKeepFailedTaskFiles()Should the temporary files for failed tasks be kept?- Returns:
- should the files be kept?
-
setKeepTaskFilesPattern
Set a regular expression for task names that should be kept. The regular expression ".*_m_000123_0" would keep the files for the first instance of map 123 that ran.- Parameters:
pattern- the java.util.regex.Pattern to match against the task names.
-
getKeepTaskFilesPattern
Get the regular expression that is matched against the task names to see if we need to keep the files.- Returns:
- the pattern as a string, if it was set, othewise null.
-
setWorkingDirectory
Set the current working directory for the default file system.- Parameters:
dir- the new current working directory.
-
getWorkingDirectory
Get the current working directory for the default file system.- Returns:
- the directory name.
-
setNumTasksToExecutePerJvm
public void setNumTasksToExecutePerJvm(int numTasks) Sets the number of tasks that a spawned task JVM should run before it exits- Parameters:
numTasks- the number of tasks to execute; defaults to 1; -1 signifies no limit
-
getNumTasksToExecutePerJvm
public int getNumTasksToExecutePerJvm()Get the number of tasks that a spawned JVM should execute -
getInputFormat
Get theInputFormatimplementation for the map-reduce job, defaults toTextInputFormatif not specified explicity.- Returns:
- the
InputFormatimplementation for the map-reduce job.
-
setInputFormat
Set theInputFormatimplementation for the map-reduce job.- Parameters:
theClass- theInputFormatimplementation for the map-reduce job.
-
getOutputFormat
Get theOutputFormatimplementation for the map-reduce job, defaults toTextOutputFormatif not specified explicity.- Returns:
- the
OutputFormatimplementation for the map-reduce job.
-
getOutputCommitter
Get theOutputCommitterimplementation for the map-reduce job, defaults toFileOutputCommitterif not specified explicitly.- Returns:
- the
OutputCommitterimplementation for the map-reduce job.
-
setOutputCommitter
Set theOutputCommitterimplementation for the map-reduce job.- Parameters:
theClass- theOutputCommitterimplementation for the map-reduce job.
-
setOutputFormat
Set theOutputFormatimplementation for the map-reduce job.- Parameters:
theClass- theOutputFormatimplementation for the map-reduce job.
-
setCompressMapOutput
public void setCompressMapOutput(boolean compress) Should the map outputs be compressed before transfer?- Parameters:
compress- should the map outputs be compressed?
-
getCompressMapOutput
public boolean getCompressMapOutput()Are the outputs of the maps be compressed?- Returns:
trueif the outputs of the maps are to be compressed,falseotherwise.
-
setMapOutputCompressorClass
Set the given class as theCompressionCodecfor the map outputs.- Parameters:
codecClass- theCompressionCodecclass that will compress the map outputs.
-
getMapOutputCompressorClass
public Class<? extends CompressionCodec> getMapOutputCompressorClass(Class<? extends CompressionCodec> defaultValue) Get theCompressionCodecfor compressing the map outputs.- Parameters:
defaultValue- theCompressionCodecto return if not set- Returns:
- the
CompressionCodecclass that should be used to compress the map outputs. - Throws:
IllegalArgumentException- if the class was specified, but not found
-
getMapOutputKeyClass
Get the key class for the map output data. If it is not set, use the (final) output key class. This allows the map output key class to be different than the final output key class.- Returns:
- the map output key class.
-
setMapOutputKeyClass
Set the key class for the map output data. This allows the user to specify the map output key class to be different than the final output value class.- Parameters:
theClass- the map output key class.
-
getMapOutputValueClass
Get the value class for the map output data. If it is not set, use the (final) output value class This allows the map output value class to be different than the final output value class.- Returns:
- the map output value class.
-
setMapOutputValueClass
Set the value class for the map output data. This allows the user to specify the map output value class to be different than the final output value class.- Parameters:
theClass- the map output value class.
-
getOutputKeyClass
Get the key class for the job output data.- Returns:
- the key class for the job output data.
-
setOutputKeyClass
Set the key class for the job output data.- Parameters:
theClass- the key class for the job output data.
-
getOutputKeyComparator
Get theRawComparatorcomparator used to compare keys.- Returns:
- the
RawComparatorcomparator used to compare keys.
-
setOutputKeyComparatorClass
Set theRawComparatorcomparator used to compare keys.- Parameters:
theClass- theRawComparatorcomparator used to compare keys.- See Also:
-
setKeyFieldComparatorOptions
Set theKeyFieldBasedComparatoroptions used to compare keys.- Parameters:
keySpec- the key specification of the form -k pos1[,pos2], where, pos is of the form f[.c][opts], where f is the number of the key field to use, and c is the number of the first character from the beginning of the field. Fields and character posns are numbered starting with 1; a character position of zero in pos2 indicates the field's last character. If '.c' is omitted from pos1, it defaults to 1 (the beginning of the field); if omitted from pos2, it defaults to 0 (the end of the field). opts are ordering options. The supported options are: -n, (Sort numerically) -r, (Reverse the result of comparison)
-
getKeyFieldComparatorOption
Get theKeyFieldBasedComparatoroptions -
setKeyFieldPartitionerOptions
Set theKeyFieldBasedPartitioneroptions used forPartitioner- Parameters:
keySpec- the key specification of the form -k pos1[,pos2], where, pos is of the form f[.c][opts], where f is the number of the key field to use, and c is the number of the first character from the beginning of the field. Fields and character posns are numbered starting with 1; a character position of zero in pos2 indicates the field's last character. If '.c' is omitted from pos1, it defaults to 1 (the beginning of the field); if omitted from pos2, it defaults to 0 (the end of the field).
-
getKeyFieldPartitionerOption
Get theKeyFieldBasedPartitioneroptions -
getCombinerKeyGroupingComparator
Get the user definedWritableComparablecomparator for grouping keys of inputs to the combiner.- Returns:
- comparator set by the user for grouping values.
- See Also:
-
getOutputValueGroupingComparator
Get the user definedWritableComparablecomparator for grouping keys of inputs to the reduce.- Returns:
- comparator set by the user for grouping values.
- See Also:
-
setCombinerKeyGroupingComparator
Set the user definedRawComparatorcomparator for grouping keys in the input to the combiner.This comparator should be provided if the equivalence rules for keys for sorting the intermediates are different from those for grouping keys before each call to
Reducer.reduce(Object, java.util.Iterator, OutputCollector, Reporter).For key-value pairs (K1,V1) and (K2,V2), the values (V1, V2) are passed in a single call to the reduce function if K1 and K2 compare as equal.
Since
setOutputKeyComparatorClass(Class)can be used to control how keys are sorted, this can be used in conjunction to simulate secondary sort on values.Note: This is not a guarantee of the combiner sort being stable in any sense. (In any case, with the order of available map-outputs to the combiner being non-deterministic, it wouldn't make that much sense.)
- Parameters:
theClass- the comparator class to be used for grouping keys for the combiner. It should implementRawComparator.- See Also:
-
setOutputValueGroupingComparator
Set the user definedRawComparatorcomparator for grouping keys in the input to the reduce.This comparator should be provided if the equivalence rules for keys for sorting the intermediates are different from those for grouping keys before each call to
Reducer.reduce(Object, java.util.Iterator, OutputCollector, Reporter).For key-value pairs (K1,V1) and (K2,V2), the values (V1, V2) are passed in a single call to the reduce function if K1 and K2 compare as equal.
Since
setOutputKeyComparatorClass(Class)can be used to control how keys are sorted, this can be used in conjunction to simulate secondary sort on values.Note: This is not a guarantee of the reduce sort being stable in any sense. (In any case, with the order of available map-outputs to the reduce being non-deterministic, it wouldn't make that much sense.)
- Parameters:
theClass- the comparator class to be used for grouping keys. It should implementRawComparator.- See Also:
-
getUseNewMapper
public boolean getUseNewMapper()Should the framework use the new context-object code for running the mapper?- Returns:
- true, if the new api should be used
-
setUseNewMapper
public void setUseNewMapper(boolean flag) Set whether the framework should use the new api for the mapper. This is the default for jobs submitted with the new Job api.- Parameters:
flag- true, if the new api should be used
-
getUseNewReducer
public boolean getUseNewReducer()Should the framework use the new context-object code for running the reducer?- Returns:
- true, if the new api should be used
-
setUseNewReducer
public void setUseNewReducer(boolean flag) Set whether the framework should use the new api for the reducer. This is the default for jobs submitted with the new Job api.- Parameters:
flag- true, if the new api should be used
-
getOutputValueClass
Get the value class for job outputs.- Returns:
- the value class for job outputs.
-
setOutputValueClass
Set the value class for job outputs.- Parameters:
theClass- the value class for job outputs.
-
getMapperClass
Get theMapperclass for the job.- Returns:
- the
Mapperclass for the job.
-
setMapperClass
Set theMapperclass for the job.- Parameters:
theClass- theMapperclass for the job.
-
getMapRunnerClass
Get theMapRunnableclass for the job.- Returns:
- the
MapRunnableclass for the job.
-
setMapRunnerClass
Expert: Set theMapRunnableclass for the job. Typically used to exert greater control onMappers.- Parameters:
theClass- theMapRunnableclass for the job.
-
getPartitionerClass
- Returns:
- the
Partitionerused to partition map-outputs.
-
setPartitionerClass
- Parameters:
theClass- thePartitionerused to partition map-outputs.
-
getReducerClass
Get theReducerclass for the job.- Returns:
- the
Reducerclass for the job.
-
setReducerClass
Set theReducerclass for the job.- Parameters:
theClass- theReducerclass for the job.
-
getCombinerClass
Get the user-defined combiner class used to combine map-outputs before being sent to the reducers. Typically the combiner is same as the theReducerfor the job i.e.getReducerClass().- Returns:
- the user-defined combiner class used to combine map-outputs.
-
setCombinerClass
Set the user-defined combiner class used to combine map-outputs before being sent to the reducers.The combiner is an application-specified aggregation operation, which can help cut down the amount of data transferred between the
Mapperand theReducer, leading to better performance.The framework may invoke the combiner 0, 1, or multiple times, in both the mapper and reducer tasks. In general, the combiner is called as the sort/merge result is written to disk. The combiner must:
- be side-effect free
- have the same input and output key types and the same input and output value types
Typically the combiner is same as the
Reducerfor the job i.e.setReducerClass(Class).- Parameters:
theClass- the user-defined combiner class used to combine map-outputs.
-
getSpeculativeExecution
public boolean getSpeculativeExecution()Should speculative execution be used for this job? Defaults totrue.- Returns:
trueif speculative execution be used for this job,falseotherwise.
-
setSpeculativeExecution
public void setSpeculativeExecution(boolean speculativeExecution) Turn speculative execution on or off for this job.- Parameters:
speculativeExecution-trueif speculative execution should be turned on, elsefalse.
-
getMapSpeculativeExecution
public boolean getMapSpeculativeExecution()Should speculative execution be used for this job for map tasks? Defaults totrue.- Returns:
trueif speculative execution be used for this job for map tasks,falseotherwise.
-
setMapSpeculativeExecution
public void setMapSpeculativeExecution(boolean speculativeExecution) Turn speculative execution on or off for this job for map tasks.- Parameters:
speculativeExecution-trueif speculative execution should be turned on for map tasks, elsefalse.
-
getReduceSpeculativeExecution
public boolean getReduceSpeculativeExecution()Should speculative execution be used for this job for reduce tasks? Defaults totrue.- Returns:
trueif speculative execution be used for reduce tasks for this job,falseotherwise.
-
setReduceSpeculativeExecution
public void setReduceSpeculativeExecution(boolean speculativeExecution) Turn speculative execution on or off for this job for reduce tasks.- Parameters:
speculativeExecution-trueif speculative execution should be turned on for reduce tasks, elsefalse.
-
getNumMapTasks
public int getNumMapTasks()Get the configured number of map tasks for this job. Defaults to1.- Returns:
- the number of map tasks for this job.
-
setNumMapTasks
public void setNumMapTasks(int n) Set the number of map tasks for this job.Note: This is only a hint to the framework. The actual number of spawned map tasks depends on the number of
How many maps?InputSplits generated by the job'sInputFormat.getSplits(JobConf, int). A customInputFormatis typically used to accurately control the number of map tasks for the job.The number of maps is usually driven by the total size of the inputs i.e. total number of blocks of the input files.
The right level of parallelism for maps seems to be around 10-100 maps per-node, although it has been set up to 300 or so for very cpu-light map tasks. Task setup takes awhile, so it is best if the maps take at least a minute to execute.
The default behavior of file-based
InputFormats is to split the input into logicalInputSplits based on the total size, in bytes, of input files. However, theFileSystemblocksize of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapreduce.input.fileinputformat.split.minsize.Thus, if you expect 10TB of input data and have a blocksize of 128MB, you'll end up with 82,000 maps, unless
setNumMapTasks(int)is used to set it even higher.- Parameters:
n- the number of map tasks for this job.- See Also:
-
getNumReduceTasks
public int getNumReduceTasks()Get the configured number of reduce tasks for this job. Defaults to1.- Returns:
- the number of reduce tasks for this job.
-
setNumReduceTasks
public void setNumReduceTasks(int n) Set the requisite number of reduce tasks for this job. How many reduces?The right number of reduces seems to be
0.95or1.75multiplied by ( available memory for reduce tasks (The value of this should be smaller than numNodes * yarn.nodemanager.resource.memory-mb since the resource of memory is shared by map tasks and other applications) / mapreduce.reduce.memory.mb).With
0.95all of the reduces can launch immediately and start transfering map outputs as the maps finish. With1.75the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.
The scaling factors above are slightly less than whole numbers to reserve a few reduce slots in the framework for speculative-tasks, failures etc.
Reducer NONEIt is legal to set the number of reduce-tasks to
zero.In this case the output of the map-tasks directly go to distributed file-system, to the path set by
FileOutputFormat.setOutputPath(JobConf, Path). Also, the framework doesn't sort the map-outputs before writing it out to HDFS.- Parameters:
n- the number of reduce tasks for this job.
-
getMaxMapAttempts
public int getMaxMapAttempts()Get the configured number of maximum attempts that will be made to run a map task, as specified by themapreduce.map.maxattemptsproperty. If this property is not already set, the default is 4 attempts.- Returns:
- the max number of attempts per map task.
-
setMaxMapAttempts
public void setMaxMapAttempts(int n) Expert: Set the number of maximum attempts that will be made to run a map task.- Parameters:
n- the number of attempts per map task.
-
getMaxReduceAttempts
public int getMaxReduceAttempts()Get the configured number of maximum attempts that will be made to run a reduce task, as specified by themapreduce.reduce.maxattemptsproperty. If this property is not already set, the default is 4 attempts.- Returns:
- the max number of attempts per reduce task.
-
setMaxReduceAttempts
public void setMaxReduceAttempts(int n) Expert: Set the number of maximum attempts that will be made to run a reduce task.- Parameters:
n- the number of attempts per reduce task.
-
getJobName
Get the user-specified job name. This is only used to identify the job to the user.- Returns:
- the job's name, defaulting to "".
-
setJobName
Set the user-specified job name.- Parameters:
name- the job's new name.
-
getSessionId
Deprecated.Get the user-specified session identifier. The default is the empty string. The session identifier is used to tag metric data that is reported to some performance metrics system via the org.apache.hadoop.metrics API. The session identifier is intended, in particular, for use by Hadoop-On-Demand (HOD) which allocates a virtual Hadoop cluster dynamically and transiently. HOD will set the session identifier by modifying the mapred-site.xml file before starting the cluster. When not running under HOD, this identifer is expected to remain set to the empty string.- Returns:
- the session identifier, defaulting to "".
-
setSessionId
Deprecated.Set the user-specified session identifier.- Parameters:
sessionId- the new session id.
-
setMaxTaskFailuresPerTracker
public void setMaxTaskFailuresPerTracker(int noFailures) Set the maximum no. of failures of a given job per tasktracker. If the no. of task failures exceedsnoFailures, the tasktracker is blacklisted for this job.- Parameters:
noFailures- maximum no. of failures of a given job per tasktracker.
-
getMaxTaskFailuresPerTracker
public int getMaxTaskFailuresPerTracker()Expert: Get the maximum no. of failures of a given job per tasktracker. If the no. of task failures exceeds this, the tasktracker is blacklisted for this job.- Returns:
- the maximum no. of failures of a given job per tasktracker.
-
getMaxMapTaskFailuresPercent
public int getMaxMapTaskFailuresPercent()Get the maximum percentage of map tasks that can fail without the job being aborted. Each map task is executed a minimum ofgetMaxMapAttempts()attempts before being declared as failed. Defaults tozero, i.e. any failed map-task results in the job being declared asJobStatus.FAILED.- Returns:
- the maximum percentage of map tasks that can fail without the job being aborted.
-
setMaxMapTaskFailuresPercent
public void setMaxMapTaskFailuresPercent(int percent) Expert: Set the maximum percentage of map tasks that can fail without the job being aborted. Each map task is executed a minimum ofgetMaxMapAttempts()attempts before being declared as failed.- Parameters:
percent- the maximum percentage of map tasks that can fail without the job being aborted.
-
getMaxReduceTaskFailuresPercent
public int getMaxReduceTaskFailuresPercent()Get the maximum percentage of reduce tasks that can fail without the job being aborted. Each reduce task is executed a minimum ofgetMaxReduceAttempts()attempts before being declared as failed. Defaults tozero, i.e. any failed reduce-task results in the job being declared asJobStatus.FAILED.- Returns:
- the maximum percentage of reduce tasks that can fail without the job being aborted.
-
setMaxReduceTaskFailuresPercent
public void setMaxReduceTaskFailuresPercent(int percent) Set the maximum percentage of reduce tasks that can fail without the job being aborted. Each reduce task is executed a minimum ofgetMaxReduceAttempts()attempts before being declared as failed.- Parameters:
percent- the maximum percentage of reduce tasks that can fail without the job being aborted.
-
setJobPriority
SetJobPriorityfor this job.- Parameters:
prio- theJobPriorityfor this job.
-
setJobPriorityAsInteger
public void setJobPriorityAsInteger(int prio) SetJobPriorityfor this job.- Parameters:
prio- theJobPriorityfor this job.
-
getJobPriority
Get theJobPriorityfor this job.- Returns:
- the
JobPriorityfor this job.
-
getJobPriorityAsInteger
public int getJobPriorityAsInteger()Get the priority for this job.- Returns:
- the priority for this job.
-
getProfileEnabled
public boolean getProfileEnabled()Get whether the task profiling is enabled.- Returns:
- true if some tasks will be profiled
-
setProfileEnabled
public void setProfileEnabled(boolean newValue) Set whether the system should collect profiler information for some of the tasks in this job? The information is stored in the user log directory.- Parameters:
newValue- true means it should be gathered
-
getProfileParams
Get the profiler configuration arguments. The default value for this property is "-agentlib:hprof=cpu=samples,heap=sites,force=n,thread=y,verbose=n,file=%s"- Returns:
- the parameters to pass to the task child to configure profiling
-
setProfileParams
Set the profiler configuration arguments. If the string contains a '%s' it will be replaced with the name of the profiling output file when the task runs. This value is passed to the task child JVM on the command line.- Parameters:
value- the configuration string
-
getProfileTaskRange
public org.apache.hadoop.conf.Configuration.IntegerRanges getProfileTaskRange(boolean isMap) Get the range of maps or reduces to profile.- Parameters:
isMap- is the task a map?- Returns:
- the task ranges
-
setProfileTaskRange
Set the ranges of maps or reduces to profile. setProfileEnabled(true) must also be called.- Parameters:
newValue- a set of integer ranges of the map ids
-
setMapDebugScript
Set the debug script to run when the map tasks fail.The debug script can aid debugging of failed map tasks. The script is given task's stdout, stderr, syslog, jobconf files as arguments.
The debug command, run on the node where the map failed, is:
$script $stdout $stderr $syslog $jobconf.
The script file is distributed through
DistributedCacheAPIs. The script needs to be symlinked.Here is an example on how to submit a script
job.setMapDebugScript("./myscript"); DistributedCache.createSymlink(job); DistributedCache.addCacheFile("/debug/scripts/myscript#myscript");- Parameters:
mDbgScript- the script name
-
getMapDebugScript
Get the map task's debug script.- Returns:
- the debug Script for the mapred job for failed map tasks.
- See Also:
-
setReduceDebugScript
Set the debug script to run when the reduce tasks fail.The debug script can aid debugging of failed reduce tasks. The script is given task's stdout, stderr, syslog, jobconf files as arguments.
The debug command, run on the node where the map failed, is:
$script $stdout $stderr $syslog $jobconf.
The script file is distributed through
DistributedCacheAPIs. The script file needs to be symlinkedHere is an example on how to submit a script
job.setReduceDebugScript("./myscript"); DistributedCache.createSymlink(job); DistributedCache.addCacheFile("/debug/scripts/myscript#myscript");- Parameters:
rDbgScript- the script name
-
getReduceDebugScript
Get the reduce task's debug Script- Returns:
- the debug script for the mapred job for failed reduce tasks.
- See Also:
-
getJobEndNotificationURI
Get the uri to be invoked in-order to send a notification after the job has completed (success/failure).- Returns:
- the job end notification uri,
nullif it hasn't been set. - See Also:
-
setJobEndNotificationURI
Set the uri to be invoked in-order to send a notification after the job has completed (success/failure).The uri can contain 2 special parameters:
$jobIdand$jobStatus. Those, if present, are replaced by the job's identifier and completion-status respectively.This is typically used by application-writers to implement chaining of Map-Reduce jobs in an asynchronous manner.
- Parameters:
uri- the job end notification uri- See Also:
-
getJobEndNotificationCustomNotifierClass
Returns the class to be invoked in order to send a notification after the job has completed (success/failure).- Returns:
- the fully-qualified name of the class which implements
CustomJobEndNotifierset through theMRJobConfig.MR_JOB_END_NOTIFICATION_CUSTOM_NOTIFIER_CLASSproperty - See Also:
-
setJobEndNotificationCustomNotifierClass(java.lang.String)MRJobConfig.MR_JOB_END_NOTIFICATION_CUSTOM_NOTIFIER_CLASS
-
setJobEndNotificationCustomNotifierClass
Sets the class to be invoked in order to send a notification after the job has completed (success/failure). A notification url still has to be set which will be passed toCustomJobEndNotifier.notifyOnce(java.net.URL, org.apache.hadoop.conf.Configuration)along with the Job's conf. If this is set instead of using a simple HttpURLConnection we'll create a new instance of this class which should be an implementation ofCustomJobEndNotifier, and we'll invoke that.- Parameters:
customNotifierClassName- the fully-qualified name of the class which implementsCustomJobEndNotifier- See Also:
-
setJobEndNotificationURI(java.lang.String)MRJobConfig.MR_JOB_END_NOTIFICATION_CUSTOM_NOTIFIER_CLASS
-
getJobLocalDir
Get job-specific shared directory for use as scratch spaceWhen a job starts, a shared directory is created at location
This value is available as System property also.${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/work/. This directory is exposed to the users throughmapreduce.job.local.dir. So, the tasks can use this space as scratch space and share files among them.- Returns:
- The localized job specific shared directory
-
getMemoryForMapTask
public long getMemoryForMapTask()Get memory required to run a map task of the job, in MB. If a value is specified in the configuration, it is returned. Else, it returnsMRJobConfig.DEFAULT_MAP_MEMORY_MB.For backward compatibility, if the job configuration sets the key
MAPRED_TASK_MAXVMEM_PROPERTYto a value different fromDISABLED_MEMORY_LIMIT, that value will be used after converting it from bytes to MB.- Returns:
- memory required to run a map task of the job, in MB,
-
setMemoryForMapTask
public void setMemoryForMapTask(long mem) -
getMemoryForReduceTask
public long getMemoryForReduceTask()Get memory required to run a reduce task of the job, in MB. If a value is specified in the configuration, it is returned. Else, it returnsMRJobConfig.DEFAULT_REDUCE_MEMORY_MB.For backward compatibility, if the job configuration sets the key
MAPRED_TASK_MAXVMEM_PROPERTYto a value different fromDISABLED_MEMORY_LIMIT, that value will be used after converting it from bytes to MB.- Returns:
- memory required to run a reduce task of the job, in MB.
-
setMemoryForReduceTask
public void setMemoryForReduceTask(long mem) -
getQueueName
Return the name of the queue to which this job is submitted. Defaults to 'default'.- Returns:
- name of the queue
-
setQueueName
Set the name of the queue to which this job should be submitted.- Parameters:
queueName- Name of the queue
-
normalizeMemoryConfigValue
public static long normalizeMemoryConfigValue(long val) Normalize the negative values in configuration- Parameters:
val-- Returns:
- normalized value
-
findContainingJar
Find a jar that contains a class of the same name, if any. It will return a jar file, even if that is not the first thing on the class path that has a class with the same name.- Parameters:
my_class- the class to find.- Returns:
- a jar file that contains the class, or null.
-
getMaxVirtualMemoryForTask
Deprecated.Get the memory required to run a task of this job, in bytes. SeeMAPRED_TASK_MAXVMEM_PROPERTYThis method is deprecated. Now, different memory limits can be set for map and reduce tasks of a job, in MB.
For backward compatibility, if the job configuration sets the key
MAPRED_TASK_MAXVMEM_PROPERTY, that value is returned. Otherwise, this method will return the larger of the values returned bygetMemoryForMapTask()andgetMemoryForReduceTask()after converting them into bytes.- Returns:
- Memory required to run a task of this job, in bytes.
- See Also:
-
setMaxVirtualMemoryForTask
Deprecated.Set the maximum amount of memory any task of this job can use. SeeMAPRED_TASK_MAXVMEM_PROPERTYmapred.task.maxvmem is split into mapreduce.map.memory.mb and mapreduce.map.memory.mb,mapred each of the new key are set as mapred.task.maxvmem / 1024 as new values are in MB
- Parameters:
vmem- Maximum amount of virtual memory in bytes any task of this job can use.- See Also:
-
getMaxPhysicalMemoryForTask
Deprecated.this variable is deprecated and nolonger in use. -
setMaxPhysicalMemoryForTask
Deprecated. -
getTaskJavaOpts
-
parseMaximumHeapSizeMB
Parse the Maximum heap size from the java opts as specified by the -Xmx option Format: -Xmx<size>[g|G|m|M|k|K]- Parameters:
javaOpts- String to parse to read maximum heap size- Returns:
- Maximum heap size in MB or -1 if not specified
-
getMemoryRequired
-
main
- Throws:
Exception
-