Package | Description |
---|---|
org.apache.mahout.clustering | |
org.apache.mahout.clustering.canopy | |
org.apache.mahout.clustering.fuzzykmeans | |
org.apache.mahout.clustering.iterator | |
org.apache.mahout.clustering.kmeans |
This package provides an implementation of the k-means clustering
algorithm.
|
org.apache.mahout.clustering.spectral.kmeans | |
org.apache.mahout.clustering.streaming.cluster | |
org.apache.mahout.common.distance | |
org.apache.mahout.math.hadoop.similarity | |
org.apache.mahout.math.neighborhood |
Modifier and Type | Method and Description |
---|---|
static double |
ClusteringUtils.daviesBouldinIndex(List<? extends Vector> centroids,
DistanceMeasure distanceMeasure,
List<OnlineSummarizer> clusterDistanceSummaries)
Computes the Davies-Bouldin Index for a given clustering.
|
static double |
ClusteringUtils.dunnIndex(List<? extends Vector> centroids,
DistanceMeasure distanceMeasure,
List<OnlineSummarizer> clusterDistanceSummaries)
Computes the Dunn Index of a given clustering.
|
static <T extends Vector> |
ClusteringUtils.estimateDistanceCutoff(Iterable<T> data,
DistanceMeasure distanceMeasure,
int sampleLimit) |
static double |
ClusteringUtils.estimateDistanceCutoff(List<? extends Vector> data,
DistanceMeasure distanceMeasure)
Estimates the distance cutoff.
|
static Matrix |
ClusteringUtils.getConfusionMatrix(List<? extends Vector> rowCentroids,
List<? extends Vector> columnCentroids,
Iterable<? extends Vector> datapoints,
DistanceMeasure distanceMeasure)
Creates a confusion matrix by searching for the closest cluster of both the row clustering and column clustering
of a point and adding its weight to that cell of the matrix.
|
static List<OnlineSummarizer> |
ClusteringUtils.summarizeClusterDistances(Iterable<? extends Vector> datapoints,
Iterable<? extends Vector> centroids,
DistanceMeasure distanceMeasure)
Computes the summaries for the distances in each cluster.
|
Modifier and Type | Method and Description |
---|---|
static org.apache.hadoop.fs.Path |
CanopyDriver.buildClusters(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
double t3,
double t4,
int clusterFilter,
boolean runSequential)
Deprecated.
Build a directory of Canopy clusters from the input vectors and other
arguments.
|
static org.apache.hadoop.fs.Path |
CanopyDriver.buildClusters(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
int clusterFilter,
boolean runSequential)
Deprecated.
Convenience method for backwards compatibility
|
static List<Canopy> |
CanopyClusterer.createCanopies(List<Vector> points,
DistanceMeasure measure,
double t1,
double t2)
Deprecated.
Iterate through the points, adding new canopies.
|
static void |
CanopyDriver.run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
boolean runClustering,
double clusterClassificationThreshold,
boolean runSequential)
Deprecated.
Convenience method to provide backward compatibility
|
static void |
CanopyDriver.run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
double t3,
double t4,
int clusterFilter,
boolean runClustering,
double clusterClassificationThreshold,
boolean runSequential)
Deprecated.
Build a directory of Canopy clusters from the input arguments and, if
requested, cluster the input vectors using these clusters
|
static void |
CanopyDriver.run(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
boolean runClustering,
double clusterClassificationThreshold,
boolean runSequential)
Deprecated.
Convenience method creates new Configuration() Build a directory of Canopy
clusters from the input arguments and, if requested, cluster the input
vectors using these clusters
|
Constructor and Description |
---|
Canopy(Vector center,
int canopyId,
DistanceMeasure measure)
Deprecated.
Create a new Canopy containing the given point and canopyId
|
CanopyClusterer(DistanceMeasure measure,
double t1,
double t2)
Deprecated.
|
Constructor and Description |
---|
SoftCluster(Vector center,
int clusterId,
DistanceMeasure measure)
Construct a new SoftCluster with the given point as its center
|
Modifier and Type | Method and Description |
---|---|
DistanceMeasure |
DistanceMeasureCluster.getMeasure() |
Modifier and Type | Method and Description |
---|---|
void |
DistanceMeasureCluster.setMeasure(DistanceMeasure measure) |
Constructor and Description |
---|
DistanceMeasureCluster(Vector point,
int id,
DistanceMeasure measure) |
Modifier and Type | Method and Description |
---|---|
static org.apache.hadoop.fs.Path |
RandomSeedGenerator.buildRandom(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
int k,
DistanceMeasure measure) |
static org.apache.hadoop.fs.Path |
RandomSeedGenerator.buildRandom(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
int k,
DistanceMeasure measure,
Long seed) |
boolean |
Kluster.computeConvergence(DistanceMeasure measure,
double convergenceDelta)
Return if the cluster is converged by comparing its center and centroid.
|
Constructor and Description |
---|
Kluster(Vector center,
int clusterId,
DistanceMeasure measure)
Construct a new cluster with the given point as its center
|
Modifier and Type | Method and Description |
---|---|
static org.apache.hadoop.fs.Path |
EigenSeedGenerator.buildFromEigens(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
int k,
DistanceMeasure measure) |
static void |
SpectralKMeansDriver.run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
int numDims,
int clusters,
DistanceMeasure measure,
double convergenceDelta,
int maxIterations,
org.apache.hadoop.fs.Path tempDir) |
static void |
SpectralKMeansDriver.run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
int numDims,
int clusters,
DistanceMeasure measure,
double convergenceDelta,
int maxIterations,
org.apache.hadoop.fs.Path tempDir,
int numReducers,
int blockHeight,
int oversampling,
int poweriters)
Run the Spectral KMeans clustering on the supplied arguments
|
Modifier and Type | Method and Description |
---|---|
DistanceMeasure |
StreamingKMeans.getDistanceMeasure() |
Modifier and Type | Class and Description |
---|---|
class |
ChebyshevDistanceMeasure
This class implements a "Chebyshev distance" metric by finding the maximum difference
between each coordinate.
|
class |
CosineDistanceMeasure
This class implements a cosine distance metric by dividing the dot product of two vectors by the product of their
lengths.
|
class |
EuclideanDistanceMeasure
This class implements a Euclidean distance metric by summing the square root of the squared differences
between each coordinate.
|
class |
MahalanobisDistanceMeasure |
class |
ManhattanDistanceMeasure
This class implements a "manhattan distance" metric by summing the absolute values of the difference
between each coordinate
|
class |
MinkowskiDistanceMeasure
Implement Minkowski distance, a real-valued generalization of the
integral L(n) distances: Manhattan = L1, Euclidean = L2.
|
class |
SquaredEuclideanDistanceMeasure
Like
EuclideanDistanceMeasure but it does not take the square root. |
class |
TanimotoDistanceMeasure
Tanimoto coefficient implementation.
|
class |
WeightedDistanceMeasure
Abstract implementation of DistanceMeasure with support for weights.
|
class |
WeightedEuclideanDistanceMeasure
This class implements a Euclidean distance metric by summing the square root of the squared differences
between each coordinate, optionally adding weights.
|
class |
WeightedManhattanDistanceMeasure
This class implements a "Manhattan distance" metric by summing the absolute values of the difference
between each coordinate, optionally with weights.
|
Modifier and Type | Method and Description |
---|---|
static void |
VectorDistanceSimilarityJob.run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path seeds,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
String outType) |
static void |
VectorDistanceSimilarityJob.run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path seeds,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
String outType,
Double maxDistance) |
Modifier and Type | Field and Description |
---|---|
protected DistanceMeasure |
Searcher.distanceMeasure |
Modifier and Type | Method and Description |
---|---|
DistanceMeasure |
Searcher.getDistanceMeasure() |
Constructor and Description |
---|
BruteSearch(DistanceMeasure distanceMeasure) |
FastProjectionSearch(DistanceMeasure distanceMeasure,
int numProjections,
int searchSize) |
LocalitySensitiveHashSearch(DistanceMeasure distanceMeasure,
int searchSize) |
ProjectionSearch(DistanceMeasure distanceMeasure,
int numProjections,
int searchSize) |
Searcher(DistanceMeasure distanceMeasure) |
UpdatableSearcher(DistanceMeasure distanceMeasure) |
Copyright © 2008–2015 The Apache Software Foundation. All rights reserved.