Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, .
Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, ... and returns a list of similarity and cross-similarity matrices
Primary interaction matrix
when kept to a constant will make repeatable downsampling
number of similar items to return per item, default: 50
max number of interactions after downsampling, default: 500
partitioning params for drm.par(...)
a list of org.apache.mahout.math.drm.DrmLike containing downsampled DRMs for cooccurrence and cross-cooccurrence
Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, .
Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, ... and returns a list of similarity and cross-similarity matrices. Somewhat easier to use method, which handles the ID dictionaries correctly
first in array is primary/A matrix all others are treated as secondary
use default to make repeatable, otherwise pass in system time or some randomizing seed
max similarities per items
max number of input items per item
partitioning params for drm.par(...)
a list of org.apache.mahout.math.indexeddataset.IndexedDataset containing downsampled IndexedDatasets for cooccurrence and cross-cooccurrence
Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, .
Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, ... and returns a list of similarity and cross-occurrence matrices. Somewhat easier to use method, which handles the ID dictionaries correctly and contains info about downsampling in each model calc.
first in array is primary/A matrix all others are treated as secondary, includes information used to downsample the input drm as well as the output llr(A'A), llr(A'B). The information is contained in each dataset in the array and applies to the model calculation of A' with the dataset. Todo: ignoring absolute threshold for now.
use default to make repeatable, otherwise pass in system time or some randomizing seed
a list of org.apache.mahout.math.indexeddataset.IndexedDataset containing downsampled IndexedDatasets for cooccurrence and cross-cooccurrence
Compute loglikelihood ratio see http://tdunning.
Compute loglikelihood ratio see http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.html for details
Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a DRM of rows and similar rows
Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a DRM of rows and similar rows
Primary interaction matrix
when kept to a constant will make repeatable downsampling
number of similar items to return per item, default: 50
max number of interactions after downsampling, default: 500
partitioning options used for drm.par(...)
Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a drm of rows and similar rows.
Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a drm of rows and similar rows. Uses IndexedDatasets, which handle external ID dictionaries properly
compare each row to every other
use default to make repeatable, otherwise pass in system time or some randomizing seed
max elements returned in each row
max number of input elements to use
Selectively downsample rows and items with an anomalous amount of interactions, inspired by https://github.
Selectively downsample rows and items with an anomalous amount of interactions, inspired by https://github.com/tdunning/in-memory-cooccurrence/blob/master/src/main/java/com/tdunning/cooc/Analyze.java
additionally binarizes input matrix, as we're only interesting in knowing whether interactions happened or not
matrix to downsample
random number generator seed, keep to a constant if repeatability is neccessary
number of elements in a row of the returned matrix
the downsampled DRM
Based on "Ted Dunnning & Ellen Friedman: Practical Machine Learning, Innovations in Recommendation", available at http://www.mapr.com/practical-machine-learning
see also "Sebastian Schelter, Christoph Boden, Volker Markl: Scalable Similarity-Based Neighborhood Methods with MapReduce ACM Conference on Recommender Systems 2012"