Modifier and Type | Class and Description |
---|---|
static class |
CollocReducer.Skipped |
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_MIN_SUPPORT |
static String |
MIN_SUPPORT |
Constructor and Description |
---|
CollocReducer() |
Modifier and Type | Method and Description |
---|---|
protected void |
processSubgram(Iterator<Gram> values,
org.apache.hadoop.mapreduce.Reducer.Context context)
Sum frequencies for subgram, ngrams and deliver ngram, subgram pairs to the collector.
|
protected void |
processUnigram(Iterator<Gram> values,
org.apache.hadoop.mapreduce.Reducer.Context context)
Sum frequencies for unigrams and deliver to the collector
|
protected void |
reduce(GramKey key,
Iterable<Gram> values,
org.apache.hadoop.mapreduce.Reducer.Context context)
collocation finder: pass 1 reduce phase:
given input from the mapper,
|
protected void |
setup(org.apache.hadoop.mapreduce.Reducer.Context context) |
public static final String MIN_SUPPORT
public static final int DEFAULT_MIN_SUPPORT
protected void reduce(GramKey key, Iterable<Gram> values, org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException
k:head_subgram,ngram, v:ngram:partial freq k:head_subgram v:head_subgram:partial freq k:tail_subgram,ngram, v:ngram:partial freq k:tail_subgram v:tail_subgram:partial freq k:unigram v:unigram:partial freqsum gram frequencies and output for llr calculation output is:
k:ngram:ngramfreq v:head_subgram:head_subgramfreq k:ngram:ngramfreq v:tail_subgram:tail_subgramfreq k:unigram:unigramfreq v:unigram:unigramfreqEach ngram's frequency is essentially counted twice, once for head, once for tail. frequency should be the same for the head and tail. Fix this to count only for the head and move the count into the value?
reduce
in class org.apache.hadoop.mapreduce.Reducer<GramKey,Gram,Gram,Gram>
IOException
InterruptedException
protected void setup(org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException
setup
in class org.apache.hadoop.mapreduce.Reducer<GramKey,Gram,Gram,Gram>
IOException
InterruptedException
protected void processUnigram(Iterator<Gram> values, org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException
IOException
InterruptedException
protected void processSubgram(Iterator<Gram> values, org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException
InterruptedException
IOException
Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.