SimilarityAnalysis

Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def computeSimilarities(drm: DrmLike[Int], numUsers: Int, maxInterestingItemsPerThing: Int, bcastNumInteractionsB: BCast[Vector], bcastNumInteractionsA: BCast[Vector], crossCooccurrence: Boolean = true, minLLROpt: Option[Double] = None): DrmLike[Int]
def cooccurrences(drmARaw: DrmLike[Int], randomSeed: Int = 0xdeadbeef, maxInterestingItemsPerThing: Int = 50, maxNumInteractions: Int = 500, drmBs: Array[DrmLike[Int]] = Array(), parOpts: ParOpts = defaultParOpts): List[DrmLike[Int]]

Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, .
Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, ... and returns a list of similarity and cross-similarity matrices
drmARaw
Primary interaction matrix
randomSeed
when kept to a constant will make repeatable downsampling
maxInterestingItemsPerThing
number of similar items to return per item, default: 50
maxNumInteractions
max number of interactions after downsampling, default: 500
parOpts
partitioning params for drm.par(...)
returns
a list of org.apache.mahout.math.drm.DrmLike containing downsampled DRMs for cooccurrence and cross-cooccurrence
def cooccurrencesIDSs(indexedDatasets: Array[IndexedDataset], randomSeed: Int = 0xdeadbeef, maxInterestingItemsPerThing: Int = 50, maxNumInteractions: Int = 500, parOpts: ParOpts = defaultParOpts): List[IndexedDataset]

Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, .
Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, ... and returns a list of similarity and cross-similarity matrices. Somewhat easier to use method, which handles the ID dictionaries correctly
indexedDatasets
first in array is primary/A matrix all others are treated as secondary
randomSeed
use default to make repeatable, otherwise pass in system time or some randomizing seed
maxInterestingItemsPerThing
max similarities per items
maxNumInteractions
max number of input items per item
parOpts
partitioning params for drm.par(...)
returns
a list of org.apache.mahout.math.indexeddataset.IndexedDataset containing downsampled IndexedDatasets for cooccurrence and cross-cooccurrence
def crossOccurrenceDownsampled(datasets: List[DownsamplableCrossOccurrenceDataset], randomSeed: Int = 0xdeadbeef): List[IndexedDataset]

Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, .
Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, ... and returns a list of similarity and cross-occurrence matrices. Somewhat easier to use method, which handles the ID dictionaries correctly and contains info about downsampling in each model calc.
datasets
first in array is primary/A matrix all others are treated as secondary, includes information used to downsample the input drm as well as the output llr(A'A), llr(A'B). The information is contained in each dataset in the array and applies to the model calculation of A' with the dataset. Todo: ignoring absolute threshold for now.
randomSeed
use default to make repeatable, otherwise pass in system time or some randomizing seed
returns
a list of org.apache.mahout.math.indexeddataset.IndexedDataset containing downsampled IndexedDatasets for cooccurrence and cross-cooccurrence
lazy val defaultParOpts: ParOpts
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def logLikelihoodRatio(numInteractionsWithA: Long, numInteractionsWithB: Long, numInteractionsWithAandB: Long, numInteractions: Long): Double

Compute loglikelihood ratio see http://tdunning.
Compute loglikelihood ratio see http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.html for details
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def rowSimilarity(drmARaw: DrmLike[Int], randomSeed: Int = 0xdeadbeef, maxInterestingSimilaritiesPerRow: Int = 50, maxNumInteractions: Int = 500, parOpts: ParOpts = defaultParOpts): DrmLike[Int]

Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a DRM of rows and similar rows
Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a DRM of rows and similar rows
drmARaw
Primary interaction matrix
randomSeed
when kept to a constant will make repeatable downsampling
maxInterestingSimilaritiesPerRow
number of similar items to return per item, default: 50
maxNumInteractions
max number of interactions after downsampling, default: 500
parOpts
partitioning options used for drm.par(...)
def rowSimilarityIDS(indexedDataset: IndexedDataset, randomSeed: Int = 0xdeadbeef, maxInterestingSimilaritiesPerRow: Int = 50, maxObservationsPerRow: Int = 500): IndexedDataset

Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a drm of rows and similar rows.
Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a drm of rows and similar rows. Uses IndexedDatasets, which handle external ID dictionaries properly
indexedDataset
compare each row to every other
randomSeed
use default to make repeatable, otherwise pass in system time or some randomizing seed
maxInterestingSimilaritiesPerRow
max elements returned in each row
maxObservationsPerRow
max number of input elements to use
def sampleDownAndBinarize(drmM: DrmLike[Int], seed: Int, maxNumInteractions: Int): DrmLike[Int]

Selectively downsample rows and items with an anomalous amount of interactions, inspired by https://github.
Selectively downsample rows and items with an anomalous amount of interactions, inspired by https://github.com/tdunning/in-memory-cooccurrence/blob/master/src/main/java/com/tdunning/cooc/Analyze.java
additionally binarizes input matrix, as we're only interesting in knowing whether interactions happened or not
drmM
matrix to downsample
seed
random number generator seed, keep to a constant if repeatability is neccessary
maxNumInteractions
number of elements in a row of the returned matrix
returns
the downsampled DRM
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

object SimilarityAnalysis extends Serializable

Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def computeSimilarities(drm: DrmLike[Int], numUsers: Int, maxInterestingItemsPerThing: Int, bcastNumInteractionsB: BCast[Vector], bcastNumInteractionsA: BCast[Vector], crossCooccurrence: Boolean = true, minLLROpt: Option[Double] = None): DrmLike[Int]

def cooccurrences(drmARaw: DrmLike[Int], randomSeed: Int = 0xdeadbeef, maxInterestingItemsPerThing: Int = 50, maxNumInteractions: Int = 500, drmBs: Array[DrmLike[Int]] = Array(), parOpts: ParOpts = defaultParOpts): List[DrmLike[Int]]

def cooccurrencesIDSs(indexedDatasets: Array[IndexedDataset], randomSeed: Int = 0xdeadbeef, maxInterestingItemsPerThing: Int = 50, maxNumInteractions: Int = 500, parOpts: ParOpts = defaultParOpts): List[IndexedDataset]

def crossOccurrenceDownsampled(datasets: List[DownsamplableCrossOccurrenceDataset], randomSeed: Int = 0xdeadbeef): List[IndexedDataset]

lazy val defaultParOpts: ParOpts

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

def logLikelihoodRatio(numInteractionsWithA: Long, numInteractionsWithB: Long, numInteractionsWithAandB: Long, numInteractions: Long): Double

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def rowSimilarity(drmARaw: DrmLike[Int], randomSeed: Int = 0xdeadbeef, maxInterestingSimilaritiesPerRow: Int = 50, maxNumInteractions: Int = 500, parOpts: ParOpts = defaultParOpts): DrmLike[Int]

def rowSimilarityIDS(indexedDataset: IndexedDataset, randomSeed: Int = 0xdeadbeef, maxInterestingSimilaritiesPerRow: Int = 50, maxObservationsPerRow: Int = 500): IndexedDataset

def sampleDownAndBinarize(drmM: DrmLike[Int], seed: Int, maxNumInteractions: Int): DrmLike[Int]

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped