@InterfaceAudience.Public @InterfaceStability.Stable public class SequenceFile extends Object
SequenceFile
s are flat files consisting of binary key/value
pairs.
SequenceFile
provides SequenceFile.Writer
,
SequenceFile.Reader
and SequenceFile.Sorter
classes for writing,
reading and sorting respectively.
SequenceFile
Writer
s based on the
SequenceFile.CompressionType
used to compress key/value pairs:
Writer
: Uncompressed records.
RecordCompressWriter
: Record-compressed files, only compress
values.
BlockCompressWriter
: Block-compressed files, both keys &
values are collected in 'blocks'
separately and compressed. The size of
the 'block' is configurable.
The actual compression algorithm used to compress key and/or values can be
specified by using the appropriate CompressionCodec
.
The recommended way is to use the static createWriter
methods
provided by the SequenceFile
to chose the preferred format.
The SequenceFile.Reader
acts as the bridge and can read any of the
above SequenceFile
formats.
Essentially there are 3 different formats for SequenceFile
s
depending on the CompressionType
specified. All of them share a
common header described below.
CompressionCodec
class which is used for
compression of keys and/or values (if compression is
enabled).
SequenceFile.Metadata
for this file.
100
kilobytes or so.
100
kilobytes or so.
The compressed blocks of key lengths and value lengths consist of the actual lengths of individual keys/values encoded in ZeroCompressedInteger format.
CompressionCodec
Modifier and Type | Field and Description |
---|---|
static int |
SYNC_INTERVAL
The number of bytes between sync points.
|
Modifier and Type | Method and Description |
---|---|
static org.apache.hadoop.io.SequenceFile.Writer |
createWriter(Configuration conf,
FSDataOutputStream out,
Class keyClass,
Class valClass,
org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
CompressionCodec codec)
Deprecated.
Use
createWriter(Configuration, Writer.Option...)
instead. |
static org.apache.hadoop.io.SequenceFile.Writer |
createWriter(Configuration conf,
FSDataOutputStream out,
Class keyClass,
Class valClass,
org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
CompressionCodec codec,
org.apache.hadoop.io.SequenceFile.Metadata metadata)
Deprecated.
Use
createWriter(Configuration, Writer.Option...)
instead. |
static org.apache.hadoop.io.SequenceFile.Writer |
createWriter(Configuration conf,
org.apache.hadoop.io.SequenceFile.Writer.Option... opts)
Create a new Writer with the given options.
|
static org.apache.hadoop.io.SequenceFile.Writer |
createWriter(FileContext fc,
Configuration conf,
Path name,
Class keyClass,
Class valClass,
org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
CompressionCodec codec,
org.apache.hadoop.io.SequenceFile.Metadata metadata,
EnumSet<CreateFlag> createFlag,
org.apache.hadoop.fs.Options.CreateOpts... opts)
Construct the preferred type of SequenceFile Writer.
|
static org.apache.hadoop.io.SequenceFile.Writer |
createWriter(FileSystem fs,
Configuration conf,
Path name,
Class keyClass,
Class valClass)
Deprecated.
Use
createWriter(Configuration, Writer.Option...)
instead. |
static org.apache.hadoop.io.SequenceFile.Writer |
createWriter(FileSystem fs,
Configuration conf,
Path name,
Class keyClass,
Class valClass,
int bufferSize,
short replication,
long blockSize,
boolean createParent,
org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
CompressionCodec codec,
org.apache.hadoop.io.SequenceFile.Metadata metadata)
Deprecated.
|
static org.apache.hadoop.io.SequenceFile.Writer |
createWriter(FileSystem fs,
Configuration conf,
Path name,
Class keyClass,
Class valClass,
int bufferSize,
short replication,
long blockSize,
org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
CompressionCodec codec,
Progressable progress,
org.apache.hadoop.io.SequenceFile.Metadata metadata)
Deprecated.
Use
createWriter(Configuration, Writer.Option...)
instead. |
static org.apache.hadoop.io.SequenceFile.Writer |
createWriter(FileSystem fs,
Configuration conf,
Path name,
Class keyClass,
Class valClass,
org.apache.hadoop.io.SequenceFile.CompressionType compressionType)
Deprecated.
Use
createWriter(Configuration, Writer.Option...)
instead. |
static org.apache.hadoop.io.SequenceFile.Writer |
createWriter(FileSystem fs,
Configuration conf,
Path name,
Class keyClass,
Class valClass,
org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
CompressionCodec codec)
Deprecated.
Use
createWriter(Configuration, Writer.Option...)
instead. |
static org.apache.hadoop.io.SequenceFile.Writer |
createWriter(FileSystem fs,
Configuration conf,
Path name,
Class keyClass,
Class valClass,
org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
CompressionCodec codec,
Progressable progress)
Deprecated.
Use
createWriter(Configuration, Writer.Option...)
instead. |
static org.apache.hadoop.io.SequenceFile.Writer |
createWriter(FileSystem fs,
Configuration conf,
Path name,
Class keyClass,
Class valClass,
org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
CompressionCodec codec,
Progressable progress,
org.apache.hadoop.io.SequenceFile.Metadata metadata)
Deprecated.
Use
createWriter(Configuration, Writer.Option...)
instead. |
static org.apache.hadoop.io.SequenceFile.Writer |
createWriter(FileSystem fs,
Configuration conf,
Path name,
Class keyClass,
Class valClass,
org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
Progressable progress)
Deprecated.
Use
createWriter(Configuration, Writer.Option...)
instead. |
static org.apache.hadoop.io.SequenceFile.CompressionType |
getDefaultCompressionType(Configuration job)
Get the compression type for the reduce outputs
|
static void |
setDefaultCompressionType(Configuration job,
org.apache.hadoop.io.SequenceFile.CompressionType val)
Set the default compression type for sequence files.
|
public static final int SYNC_INTERVAL
public static org.apache.hadoop.io.SequenceFile.CompressionType getDefaultCompressionType(Configuration job)
job
- the job config to look inpublic static void setDefaultCompressionType(Configuration job, org.apache.hadoop.io.SequenceFile.CompressionType val)
job
- the configuration to modifyval
- the new compression type (none, block, record)public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf, org.apache.hadoop.io.SequenceFile.Writer.Option... opts) throws IOException
conf
- the configuration to useopts
- the options to create the file withIOException
- raised on errors performing I/O.@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass) throws IOException
createWriter(Configuration, Writer.Option...)
instead.fs
- The configured filesystem.conf
- The configuration.name
- The name of the file.keyClass
- The 'key' type.valClass
- The 'value' type.IOException
- raised on errors performing I/O.@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType) throws IOException
createWriter(Configuration, Writer.Option...)
instead.fs
- The configured filesystem.conf
- The configuration.name
- The name of the file.keyClass
- The 'key' type.valClass
- The 'value' type.compressionType
- The compression type.IOException
- raised on errors performing I/O.@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, Progressable progress) throws IOException
createWriter(Configuration, Writer.Option...)
instead.fs
- The configured filesystem.conf
- The configuration.name
- The name of the file.keyClass
- The 'key' type.valClass
- The 'value' type.compressionType
- The compression type.progress
- The Progressable object to track progress.IOException
- raised on errors performing I/O.@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec) throws IOException
createWriter(Configuration, Writer.Option...)
instead.fs
- The configured filesystem.conf
- The configuration.name
- The name of the file.keyClass
- The 'key' type.valClass
- The 'value' type.compressionType
- The compression type.codec
- The compression codec.IOException
- raised on errors performing I/O.@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress, org.apache.hadoop.io.SequenceFile.Metadata metadata) throws IOException
createWriter(Configuration, Writer.Option...)
instead.fs
- The configured filesystem.conf
- The configuration.name
- The name of the file.keyClass
- The 'key' type.valClass
- The 'value' type.compressionType
- The compression type.codec
- The compression codec.progress
- The Progressable object to track progress.metadata
- The metadata of the file.IOException
- raised on errors performing I/O.@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, int bufferSize, short replication, long blockSize, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress, org.apache.hadoop.io.SequenceFile.Metadata metadata) throws IOException
createWriter(Configuration, Writer.Option...)
instead.fs
- The configured filesystem.conf
- The configuration.name
- The name of the file.keyClass
- The 'key' type.valClass
- The 'value' type.bufferSize
- buffer size for the underlaying outputstream.replication
- replication factor for the file.blockSize
- block size for the file.compressionType
- The compression type.codec
- The compression codec.progress
- The Progressable object to track progress.metadata
- The metadata of the file.IOException
- raised on errors performing I/O.@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, int bufferSize, short replication, long blockSize, boolean createParent, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec, org.apache.hadoop.io.SequenceFile.Metadata metadata) throws IOException
fs
- The configured filesystem.conf
- The configuration.name
- The name of the file.keyClass
- The 'key' type.valClass
- The 'value' type.bufferSize
- buffer size for the underlaying outputstream.replication
- replication factor for the file.blockSize
- block size for the file.createParent
- create parent directory if non-existentcompressionType
- The compression type.codec
- The compression codec.metadata
- The metadata of the file.IOException
- raised on errors performing I/O.public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileContext fc, Configuration conf, Path name, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec, org.apache.hadoop.io.SequenceFile.Metadata metadata, EnumSet<CreateFlag> createFlag, org.apache.hadoop.fs.Options.CreateOpts... opts) throws IOException
fc
- The context for the specified file.conf
- The configuration.name
- The name of the file.keyClass
- The 'key' type.valClass
- The 'value' type.compressionType
- The compression type.codec
- The compression codec.metadata
- The metadata of the file.createFlag
- gives the semantics of create: overwrite, append etc.opts
- file creation options; see Options.CreateOpts
.IOException
- raised on errors performing I/O.@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress) throws IOException
createWriter(Configuration, Writer.Option...)
instead.fs
- The configured filesystem.conf
- The configuration.name
- The name of the file.keyClass
- The 'key' type.valClass
- The 'value' type.compressionType
- The compression type.codec
- The compression codec.progress
- The Progressable object to track progress.IOException
- raised on errors performing I/O.@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf, FSDataOutputStream out, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec, org.apache.hadoop.io.SequenceFile.Metadata metadata) throws IOException
createWriter(Configuration, Writer.Option...)
instead.conf
- The configuration.out
- The stream on top which the writer is to be constructed.keyClass
- The 'key' type.valClass
- The 'value' type.compressionType
- The compression type.codec
- The compression codec.metadata
- The metadata of the file.IOException
- raised on errors performing I/O.@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf, FSDataOutputStream out, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec) throws IOException
createWriter(Configuration, Writer.Option...)
instead.conf
- The configuration.out
- The stream on top which the writer is to be constructed.keyClass
- The 'key' type.valClass
- The 'value' type.compressionType
- The compression type.codec
- The compression codec.IOException
- raised on errors performing I/O.Copyright © 2024 Apache Software Foundation. All rights reserved.