org.apache.mahout.math.cf

SimilarityAnalysis

object SimilarityAnalysis extends Serializable

Based on "Ted Dunnning & Ellen Friedman: Practical Machine Learning, Innovations in Recommendation", available at http://www.mapr.com/practical-machine-learning

see also "Sebastian Schelter, Christoph Boden, Volker Markl: Scalable Similarity-Based Neighborhood Methods with MapReduce ACM Conference on Recommender Systems 2012"

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. SimilarityAnalysis
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def computeSimilarities(drm: DrmLike[Int], numUsers: Int, maxInterestingItemsPerThing: Int, bcastNumInteractionsB: BCast[Vector], bcastNumInteractionsA: BCast[Vector], crossCooccurrence: Boolean = true, minLLROpt: Option[Double] = None): DrmLike[Int]

  9. def cooccurrences(drmARaw: DrmLike[Int], randomSeed: Int = 0xdeadbeef, maxInterestingItemsPerThing: Int = 50, maxNumInteractions: Int = 500, drmBs: Array[DrmLike[Int]] = Array(), parOpts: ParOpts = defaultParOpts): List[DrmLike[Int]]

    Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, .

    Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, ... and returns a list of similarity and cross-similarity matrices

    drmARaw

    Primary interaction matrix

    randomSeed

    when kept to a constant will make repeatable downsampling

    maxInterestingItemsPerThing

    number of similar items to return per item, default: 50

    maxNumInteractions

    max number of interactions after downsampling, default: 500

    parOpts

    partitioning params for drm.par(...)

    returns

    a list of org.apache.mahout.math.drm.DrmLike containing downsampled DRMs for cooccurrence and cross-cooccurrence

  10. def cooccurrencesIDSs(indexedDatasets: Array[IndexedDataset], randomSeed: Int = 0xdeadbeef, maxInterestingItemsPerThing: Int = 50, maxNumInteractions: Int = 500, parOpts: ParOpts = defaultParOpts): List[IndexedDataset]

    Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, .

    Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, ... and returns a list of similarity and cross-similarity matrices. Somewhat easier to use method, which handles the ID dictionaries correctly

    indexedDatasets

    first in array is primary/A matrix all others are treated as secondary

    randomSeed

    use default to make repeatable, otherwise pass in system time or some randomizing seed

    maxInterestingItemsPerThing

    max similarities per items

    maxNumInteractions

    max number of input items per item

    parOpts

    partitioning params for drm.par(...)

    returns

    a list of org.apache.mahout.math.indexeddataset.IndexedDataset containing downsampled IndexedDatasets for cooccurrence and cross-cooccurrence

  11. def crossOccurrenceDownsampled(datasets: List[DownsamplableCrossOccurrenceDataset], randomSeed: Int = 0xdeadbeef): List[IndexedDataset]

    Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, .

    Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, ... and returns a list of similarity and cross-occurrence matrices. Somewhat easier to use method, which handles the ID dictionaries correctly and contains info about downsampling in each model calc.

    datasets

    first in array is primary/A matrix all others are treated as secondary, includes information used to downsample the input drm as well as the output llr(A'A), llr(A'B). The information is contained in each dataset in the array and applies to the model calculation of A' with the dataset. Todo: ignoring absolute threshold for now.

    randomSeed

    use default to make repeatable, otherwise pass in system time or some randomizing seed

    returns

    a list of org.apache.mahout.math.indexeddataset.IndexedDataset containing downsampled IndexedDatasets for cooccurrence and cross-cooccurrence

  12. lazy val defaultParOpts: ParOpts

  13. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  14. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  15. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  17. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  18. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  19. def logLikelihoodRatio(numInteractionsWithA: Long, numInteractionsWithB: Long, numInteractionsWithAandB: Long, numInteractions: Long): Double

    Compute loglikelihood ratio see http://tdunning.

    Compute loglikelihood ratio see http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.html for details

  20. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  21. final def notify(): Unit

    Definition Classes
    AnyRef
  22. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  23. def rowSimilarity(drmARaw: DrmLike[Int], randomSeed: Int = 0xdeadbeef, maxInterestingSimilaritiesPerRow: Int = 50, maxNumInteractions: Int = 500, parOpts: ParOpts = defaultParOpts): DrmLike[Int]

    Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a DRM of rows and similar rows

    Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a DRM of rows and similar rows

    drmARaw

    Primary interaction matrix

    randomSeed

    when kept to a constant will make repeatable downsampling

    maxInterestingSimilaritiesPerRow

    number of similar items to return per item, default: 50

    maxNumInteractions

    max number of interactions after downsampling, default: 500

    parOpts

    partitioning options used for drm.par(...)

  24. def rowSimilarityIDS(indexedDataset: IndexedDataset, randomSeed: Int = 0xdeadbeef, maxInterestingSimilaritiesPerRow: Int = 50, maxObservationsPerRow: Int = 500): IndexedDataset

    Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a drm of rows and similar rows.

    Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a drm of rows and similar rows. Uses IndexedDatasets, which handle external ID dictionaries properly

    indexedDataset

    compare each row to every other

    randomSeed

    use default to make repeatable, otherwise pass in system time or some randomizing seed

    maxInterestingSimilaritiesPerRow

    max elements returned in each row

    maxObservationsPerRow

    max number of input elements to use

  25. def sampleDownAndBinarize(drmM: DrmLike[Int], seed: Int, maxNumInteractions: Int): DrmLike[Int]

    Selectively downsample rows and items with an anomalous amount of interactions, inspired by https://github.

    Selectively downsample rows and items with an anomalous amount of interactions, inspired by https://github.com/tdunning/in-memory-cooccurrence/blob/master/src/main/java/com/tdunning/cooc/Analyze.java

    additionally binarizes input matrix, as we're only interesting in knowing whether interactions happened or not

    drmM

    matrix to downsample

    seed

    random number generator seed, keep to a constant if repeatability is neccessary

    maxNumInteractions

    number of elements in a row of the returned matrix

    returns

    the downsampled DRM

  26. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  27. def toString(): String

    Definition Classes
    AnyRef → Any
  28. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped