Class DBInputFormat<T extends DBWritable>
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<LongWritable,T>
org.apache.hadoop.mapreduce.lib.db.DBInputFormat<T>
- All Implemented Interfaces:
Configurable
- Direct Known Subclasses:
DataDrivenDBInputFormat,DBInputFormat
@Public
@Stable
public class DBInputFormat<T extends DBWritable>
extends InputFormat<LongWritable,T>
implements Configurable
A InputFormat that reads input data from an SQL table.
DBInputFormat emits LongWritables containing the record number as key and DBWritables as value. The SQL query, and input class can be using one of the two setInput methods.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classorg.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplitA InputSplit that spans a set of rowsstatic classorg.apache.hadoop.mapreduce.lib.db.DBInputFormat.NullDBWritableA Class that does nothing, implementing DBWritable -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected Stringprotected Connectionprotected DBConfigurationprotected Stringprotected String[]protected String -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidprotected RecordReader<LongWritable,T> createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split, Configuration conf) createRecordReader(InputSplit split, TaskAttemptContext context) Create a record reader for a given split.getConf()Return the configuration used by this object.protected StringReturns the query for getting the total number of rows, subclasses can override this for custom behaviour.getSplits(JobContext job) Logically split the set of input files for the job.voidsetConf(Configuration conf) Set the configuration to be used by this object.static voidsetInput(Job job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery) Initializes the map-part of the job with the appropriate input settings.static voidsetInput(Job job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames) Initializes the map-part of the job with the appropriate input settings.
-
Field Details
-
dbProductName
-
conditions
-
connection
-
tableName
-
fieldNames
-
dbConf
-
-
Constructor Details
-
DBInputFormat
public DBInputFormat()
-
-
Method Details
-
setConf
Set the configuration to be used by this object.- Specified by:
setConfin interfaceConfigurable- Parameters:
conf- configuration to be used
-
getConf
Description copied from interface:ConfigurableReturn the configuration used by this object.- Specified by:
getConfin interfaceConfigurable- Returns:
- Configuration
-
getDBConf
-
getConnection
-
createConnection
-
getDBProductName
-
createDBRecordReader
protected RecordReader<LongWritable,T> createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split, Configuration conf) throws IOException - Throws:
IOException
-
createRecordReader
public RecordReader<LongWritable,T> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException Create a record reader for a given split. The framework will callRecordReader.initialize(InputSplit, TaskAttemptContext)before the split is used.- Specified by:
createRecordReaderin classInputFormat<LongWritable,T extends DBWritable> - Parameters:
split- the split to be readcontext- the information about the task- Returns:
- a new record reader
- Throws:
IOExceptionInterruptedException
-
getSplits
Logically split the set of input files for the job.Each
InputSplitis then assigned to an individualMapperfor processing.Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple. The InputFormat also creates the
RecordReaderto read theInputSplit.- Specified by:
getSplitsin classInputFormat<LongWritable,T extends DBWritable> - Parameters:
job- job configuration.- Returns:
- an array of
InputSplits for the job. - Throws:
IOException
-
getCountQuery
Returns the query for getting the total number of rows, subclasses can override this for custom behaviour. -
setInput
public static void setInput(Job job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames) Initializes the map-part of the job with the appropriate input settings.- Parameters:
job- The map-reduce jobinputClass- the class object implementing DBWritable, which is the Java object holding tuple fields.tableName- The table to read data fromconditions- The condition which to select data with, eg. '(updated > 20070101 AND length > 0)'orderBy- the fieldNames in the orderBy clause.fieldNames- The field names in the table- See Also:
-
setInput
public static void setInput(Job job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery) Initializes the map-part of the job with the appropriate input settings.- Parameters:
job- The map-reduce jobinputClass- the class object implementing DBWritable, which is the Java object holding tuple fields.inputQuery- the input query to select fields. Example : "SELECT f1, f2, f3 FROM Mytable ORDER BY f1"inputCountQuery- the input query that returns the number of records in the table. Example : "SELECT COUNT(f1) FROM Mytable"- See Also:
-
closeConnection
protected void closeConnection()
-