public final class LogLikelihood extends Object
Modifier and Type | Class and Description |
---|---|
static class |
LogLikelihood.ScoredItem<T> |
Modifier and Type | Method and Description |
---|---|
static <T> List<LogLikelihood.ScoredItem<T>> |
compareFrequencies(com.google.common.collect.Multiset<T> a,
com.google.common.collect.Multiset<T> b,
int maxReturn,
double threshold)
Compares two sets of counts to see which items are interestingly over-represented in the first
set.
|
static double |
entropy(long... elements)
Calculates the unnormalized Shannon entropy.
|
static double |
logLikelihoodRatio(long k11,
long k12,
long k21,
long k22)
Calculates the Raw Log-likelihood ratio for two events, call them A and B.
|
static double |
rootLogLikelihoodRatio(long k11,
long k12,
long k21,
long k22)
Calculates the root log-likelihood ratio for two events.
|
public static double entropy(long... elements)
public static double logLikelihoodRatio(long k11, long k12, long k21, long k22)
Event A | Everything but A | |
Event B | A and B together (k_11) | B, but not A (k_12) |
Everything but B | A without B (k_21) | Neither A nor B (k_22) |
k11
- The number of times the two events occurred togetherk12
- The number of times the second event occurred WITHOUT the first eventk21
- The number of times the first event occurred WITHOUT the second eventk22
- The number of times something else occurred (i.e. was neither of these eventspublic static double rootLogLikelihoodRatio(long k11, long k12, long k21, long k22)
logLikelihoodRatio(long, long, long, long)
.k11
- The number of times the two events occurred togetherk12
- The number of times the second event occurred WITHOUT the first eventk21
- The number of times the first event occurred WITHOUT the second eventk22
- The number of times something else occurred (i.e. was neither of these eventspublic static <T> List<LogLikelihood.ScoredItem<T>> compareFrequencies(com.google.common.collect.Multiset<T> a, com.google.common.collect.Multiset<T> b, int maxReturn, double threshold)
a
- The first counts.b
- The reference counts.maxReturn
- The maximum number of items to return. Use maxReturn >= a.elementSet.size() to return all
scores above the threshold.threshold
- The minimum score for items to be returned. Use 0 to return all items more common
in a than b. Use -Double.MAX_VALUE (not Double.MIN_VALUE !) to not use a threshold.Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.