public class StreamingKMeans extends Object implements Iterable<Centroid>
Constructor and Description |
---|
StreamingKMeans(UpdatableSearcher searcher,
int numClusters)
Calls StreamingKMeans(searcher, numClusters, 1.3, 10, 2).
|
StreamingKMeans(UpdatableSearcher searcher,
int numClusters,
double distanceCutoff)
Calls StreamingKMeans(searcher, numClusters, distanceCutoff, 1.3, 10, 2).
|
StreamingKMeans(UpdatableSearcher searcher,
int numClusters,
double distanceCutoff,
double beta,
double clusterLogFactor,
double clusterOvershoot)
Creates a new StreamingKMeans class given a searcher and the number of clusters to generate.
|
Modifier and Type | Method and Description |
---|---|
UpdatableSearcher |
cluster(Centroid datapoint)
Cluster one data point.
|
UpdatableSearcher |
cluster(Iterable<Centroid> datapoints)
Cluster the data points in an Iterable
|
UpdatableSearcher |
cluster(Matrix data)
Cluster the rows of a matrix, treating them as Centroids with weight 1.
|
double |
getDistanceCutoff() |
DistanceMeasure |
getDistanceMeasure() |
int |
getNumClusters() |
Iterator<Centroid> |
iterator() |
void |
reindexCentroids() |
void |
setDistanceCutoff(double distanceCutoff) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
forEach, spliterator
public StreamingKMeans(UpdatableSearcher searcher, int numClusters)
public StreamingKMeans(UpdatableSearcher searcher, int numClusters, double distanceCutoff)
public StreamingKMeans(UpdatableSearcher searcher, int numClusters, double distanceCutoff, double beta, double clusterLogFactor, double clusterOvershoot)
searcher
- A Searcher that is used for performing nearest neighbor search. It MUST BE
EMPTY initially because it will be used to keep track of the cluster
centroids.numClusters
- An estimated number of clusters to generate for the data points.
This can adjusted, but the actual number will depend on the data. ThedistanceCutoff
- The initial distance cutoff representing the value of the
distance between a point and its closest centroid after which
the new point will definitely be assigned to a new cluster.beta
- Ratio of geometric progression to use when increasing distanceCutoff. After n increases, distanceCutoff
becomes distanceCutoff * beta^n. A smaller value increases the distanceCutoff less aggressively.clusterLogFactor
- Value multiplied with the number of points counted so far estimating the number of clusters
to aim for. If the final number of clusters is known and this clustering is only for a
sketch of the data, this can be the final number of clusters, k.clusterOvershoot
- Multiplicative slack factor for slowing down the collapse of the clusters.public UpdatableSearcher cluster(Matrix data)
data
- matrix whose rows are to be clustered.public UpdatableSearcher cluster(Iterable<Centroid> datapoints)
datapoints
- Iterable whose elements are to be clustered.public UpdatableSearcher cluster(Centroid datapoint)
datapoint
- to be clustered.public int getNumClusters()
public void reindexCentroids()
public double getDistanceCutoff()
public void setDistanceCutoff(double distanceCutoff)
public DistanceMeasure getDistanceMeasure()
Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.