public class TextValueEncoder extends FeatureVectorEncoder
LuceneTextValueEncoder
CONTINUOUS_VALUE_HASH_SEED, WORD_LIKE_VALUE_HASH_SEED
Constructor and Description |
---|
TextValueEncoder(String name) |
Modifier and Type | Method and Description |
---|---|
void |
addText(byte[] originalForm)
Adds text to the internal word counter, but delays converting it to vector
form until flush is called.
|
void |
addText(CharSequence text)
Adds text to the internal word counter, but delays converting it to vector
form until flush is called.
|
void |
addToVector(byte[] originalForm,
double weight,
Vector data)
Adds a value to a vector after tokenizing it by splitting on non-alphanum characters.
|
String |
asString(String originalForm)
Converts a value into a form that would help a human understand the internals of how the value
is being interpreted.
|
void |
flush(double weight,
Vector data)
Adds all of the tokens that we counted up to a vector.
|
protected Iterable<Integer> |
hashesForProbe(byte[] originalForm,
int dataSize,
String name,
int probe)
Returns all of the hashes for this probe.
|
protected int |
hashForProbe(byte[] originalForm,
int dataSize,
String name,
int probe)
Provides the unique hash for a particular probe.
|
void |
setWordEncoder(FeatureVectorEncoder wordEncoder) |
protected Iterable<String> |
tokenize(CharSequence originalForm)
Tokenizes a string using the simplest method.
|
addToVector, addToVector, addToVector, bytesForString, getName, getProbes, getWeight, hash, hash, hash, hash, hash, isTraceEnabled, setProbes, setTraceDictionary, trace, trace
public TextValueEncoder(String name)
public void addToVector(byte[] originalForm, double weight, Vector data)
addToVector
in class FeatureVectorEncoder
originalForm
- The original form of the value as a string.data
- The vector to which the value should be added.public void addText(byte[] originalForm)
originalForm
- The original text encoded as UTF-8public void addText(CharSequence text)
text
- The original text encoded as UTF-8public void flush(double weight, Vector data)
protected int hashForProbe(byte[] originalForm, int dataSize, String name, int probe)
FeatureVectorEncoder
hashForProbe
in class FeatureVectorEncoder
originalForm
- The original byte array valuedataSize
- The length of the vector being encodedname
- The name of the variable being encodedprobe
- The probe numberprotected Iterable<Integer> hashesForProbe(byte[] originalForm, int dataSize, String name, int probe)
FeatureVectorEncoder
hashesForProbe
in class FeatureVectorEncoder
originalForm
- The original byte array value.dataSize
- The length of the vector being encodedname
- The name of the variable being encodedprobe
- The probe numberprotected Iterable<String> tokenize(CharSequence originalForm)
LuceneTextValueEncoder
public String asString(String originalForm)
asString
in class FeatureVectorEncoder
originalForm
- The original form of the value as a string.public final void setWordEncoder(FeatureVectorEncoder wordEncoder)
Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.