public class CollocMapper extends org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.Text,StringTuple,GramKey,Gram>
Modifier and Type | Class and Description |
---|---|
static class |
CollocMapper.Count |
Modifier and Type | Field and Description |
---|---|
static String |
MAX_SHINGLE_SIZE |
Constructor and Description |
---|
CollocMapper() |
Modifier and Type | Method and Description |
---|---|
protected void |
map(org.apache.hadoop.io.Text key,
StringTuple value,
org.apache.hadoop.mapreduce.Mapper.Context context)
Collocation finder: pass 1 map phase.
|
protected void |
setup(org.apache.hadoop.mapreduce.Mapper.Context context) |
public static final String MAX_SHINGLE_SIZE
protected void map(org.apache.hadoop.io.Text key, StringTuple value, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException
k:head_key, v:head_subgram k:head_key,ngram_key, v:ngram k:tail_key, v:tail_subgram k:tail_key,ngram_key, v:ngramThe 'head' or 'tail' prefix is used to specify whether the subgram in question is the head or tail of the ngram. In this implementation the head of the ngram is a (n-1)gram, and the tail is a (1)gram. For example, given 'click and clack' and an ngram length of 3:
k: head_'click and' v:head_'click and' k: head_'click and',ngram_'click and clack' v:ngram_'click and clack' k: tail_'clack', v:tail_'clack' k: tail_'clack',ngram_'click and clack' v:ngram_'click and clack'Also counts the total number of ngrams encountered and adds it to the counter CollocDriver.Count.NGRAM_TOTAL
map
in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.Text,StringTuple,GramKey,Gram>
IOException
- if there's a problem with the ShingleFilter reading data or the collector collecting output.InterruptedException
protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException
setup
in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.Text,StringTuple,GramKey,Gram>
IOException
InterruptedException
Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.