public final class PartialVectorMerger extends Object
WritableComparable
key containing document id and a VectorWritable
value containing the term frequency vector. This
class also does normalization of the vector.Modifier and Type | Field and Description |
---|---|
static String |
DIMENSION |
static String |
LOG_NORMALIZE |
static String |
NAMED_VECTOR |
static float |
NO_NORMALIZING |
static String |
NORMALIZATION_POWER |
static String |
SEQUENTIAL_ACCESS |
Modifier and Type | Method and Description |
---|---|
static void |
mergePartialVectors(Iterable<org.apache.hadoop.fs.Path> partialVectorPaths,
org.apache.hadoop.fs.Path output,
org.apache.hadoop.conf.Configuration baseConf,
float normPower,
boolean logNormalize,
int dimension,
boolean sequentialAccess,
boolean namedVector,
int numReducers)
Merge all the partial
RandomAccessSparseVector s into the complete Document
RandomAccessSparseVector |
public static final float NO_NORMALIZING
public static final String NORMALIZATION_POWER
public static final String DIMENSION
public static final String SEQUENTIAL_ACCESS
public static final String NAMED_VECTOR
public static final String LOG_NORMALIZE
public static void mergePartialVectors(Iterable<org.apache.hadoop.fs.Path> partialVectorPaths, org.apache.hadoop.fs.Path output, org.apache.hadoop.conf.Configuration baseConf, float normPower, boolean logNormalize, int dimension, boolean sequentialAccess, boolean namedVector, int numReducers) throws IOException, InterruptedException, ClassNotFoundException
RandomAccessSparseVector
s into the complete Document
RandomAccessSparseVector
partialVectorPaths
- input directory of the vectors in SequenceFile
formatoutput
- output directory were the partial vectors have to be createdbaseConf
- job configurationnormPower
- The normalization value. Must be greater than or equal to 0 or equal to NO_NORMALIZING
dimension
- cardinality of the vectorssequentialAccess
- output vectors should be optimized for sequential accessnamedVector
- output vectors should be named, retaining key (doc id) as a labelnumReducers
- The number of reducers to spawnIOException
InterruptedException
ClassNotFoundException
Copyright © 2008–2015 The Apache Software Foundation. All rights reserved.