public final class ClusteringUtils extends Object
Modifier and Type | Method and Description |
---|---|
static double |
choose2(double n) |
static double |
daviesBouldinIndex(List<? extends Vector> centroids,
DistanceMeasure distanceMeasure,
List<OnlineSummarizer> clusterDistanceSummaries)
Computes the Davies-Bouldin Index for a given clustering.
|
static double |
dunnIndex(List<? extends Vector> centroids,
DistanceMeasure distanceMeasure,
List<OnlineSummarizer> clusterDistanceSummaries)
Computes the Dunn Index of a given clustering.
|
static <T extends Vector> |
estimateDistanceCutoff(Iterable<T> data,
DistanceMeasure distanceMeasure,
int sampleLimit) |
static double |
estimateDistanceCutoff(List<? extends Vector> data,
DistanceMeasure distanceMeasure)
Estimates the distance cutoff.
|
static double |
getAdjustedRandIndex(Matrix confusionMatrix)
Computes the Adjusted Rand Index for a given confusion matrix.
|
static Matrix |
getConfusionMatrix(List<? extends Vector> rowCentroids,
List<? extends Vector> columnCentroids,
Iterable<? extends Vector> datapoints,
DistanceMeasure distanceMeasure)
Creates a confusion matrix by searching for the closest cluster of both the row clustering and column clustering
of a point and adding its weight to that cell of the matrix.
|
static List<OnlineSummarizer> |
summarizeClusterDistances(Iterable<? extends Vector> datapoints,
Iterable<? extends Vector> centroids,
DistanceMeasure distanceMeasure)
Computes the summaries for the distances in each cluster.
|
static double |
totalClusterCost(Iterable<? extends Vector> datapoints,
Iterable<? extends Vector> centroids)
Adds up the distances from each point to its closest cluster and returns the sum.
|
static double |
totalClusterCost(Iterable<? extends Vector> datapoints,
Searcher centroids)
Adds up the distances from each point to its closest cluster and returns the sum.
|
static double |
totalWeight(Iterable<? extends Vector> data)
Computes the total weight of the points in the given Vector iterable.
|
public static List<OnlineSummarizer> summarizeClusterDistances(Iterable<? extends Vector> datapoints, Iterable<? extends Vector> centroids, DistanceMeasure distanceMeasure)
datapoints
- iterable of datapoints.centroids
- iterable of Centroids.public static double totalClusterCost(Iterable<? extends Vector> datapoints, Iterable<? extends Vector> centroids)
datapoints
- iterable of datapoints.centroids
- iterable of Centroids.public static double totalClusterCost(Iterable<? extends Vector> datapoints, Searcher centroids)
datapoints
- iterable of datapoints.centroids
- searcher of Centroids.public static double estimateDistanceCutoff(List<? extends Vector> data, DistanceMeasure distanceMeasure)
data
- the datapoints whose distance is to be estimated.distanceMeasure
- the distance measure used to compute the distance between two points.StreamingKMeans.clusterInternal(Iterable, boolean)
public static <T extends Vector> double estimateDistanceCutoff(Iterable<T> data, DistanceMeasure distanceMeasure, int sampleLimit)
public static double daviesBouldinIndex(List<? extends Vector> centroids, DistanceMeasure distanceMeasure, List<OnlineSummarizer> clusterDistanceSummaries)
centroids
- list of centroidsdistanceMeasure
- distance measure for inter-cluster distancesclusterDistanceSummaries
- summaries of the clusters; See summarizeClusterDistancespublic static double dunnIndex(List<? extends Vector> centroids, DistanceMeasure distanceMeasure, List<OnlineSummarizer> clusterDistanceSummaries)
centroids
- list of centroidsdistanceMeasure
- distance measure to compute inter-centroid distance withclusterDistanceSummaries
- summaries of the clusters; See summarizeClusterDistancespublic static double choose2(double n)
public static Matrix getConfusionMatrix(List<? extends Vector> rowCentroids, List<? extends Vector> columnCentroids, Iterable<? extends Vector> datapoints, DistanceMeasure distanceMeasure)
rowCentroids
- clustering onecolumnCentroids
- clustering twodatapoints
- datapoints whose closest cluster we need to finddistanceMeasure
- distance measure to usepublic static double getAdjustedRandIndex(Matrix confusionMatrix)
confusionMatrix
- confusion matrix; not to be confused with the more restrictive ConfusionMatrix classCopyright © 2008–2015 The Apache Software Foundation. All rights reserved.