public class BigramStopFilter
extends TokenFilter
| Modifier and Type | Field and Description |
|---|---|
private int |
accumIncrement
Accumulates position increment of removed tokens
|
private boolean |
firstTime
true before next() called for the first time
|
private int |
inputPos
Tracks the position of input tokens, for debugging
|
private int |
MAX_POSITION
Limit on position values, for the extremely rare case of fields with > 2000 entries
|
private Token |
nextToken
The next token to process
|
private int |
outputPos
Tracks the position of output tokens, for debugging
|
private Token |
outputQueue
Queue of output tokens, only required in some cases
|
private Set |
stopSet
Set of stop-words (e.g.
|
static Object |
tester
Basic regression test
|
| Constructor and Description |
|---|
BigramStopFilter(TokenStream input,
Set stopSet)
Construct a token stream to filter 'stopWords' out of 'input'.
|
| Modifier and Type | Method and Description |
|---|---|
private Token |
glomToken(Token token1,
Token token2,
int increment)
Constructs a new token, drawing the start position, position increment,
and end position from the specified tokens.
|
protected boolean |
isStopWord(String word)
Tells whether the token is a stop-word.
|
static Set |
makeStopSet(String stopWords)
Make a stop set given a space, comma, or semicolon delimited list of
stop words.
|
Token |
next()
Retrieve the next token in the stream.
|
private Token |
nextInput()
Retrieves the next token from the input stream, properly tracking the
input position.
|
Token |
nextInternal()
Retrieve the next token in the stream.
|
private Set stopSet
private boolean firstTime
private Token nextToken
private Token outputQueue
private int accumIncrement
private int outputPos
private int inputPos
private final int MAX_POSITION
public static final Object tester
public BigramStopFilter(TokenStream input,
Set stopSet)
input - Input stream of tokens to processstopSet - Set of stop words to filter out. This can be most easily
made by calling makeStopSet().public static Set makeStopSet(String stopWords)
stopWords - String of words to make into a setBigramStopFilter.public Token next()
throws IOException
next in class TokenStreamIOExceptionpublic Token nextInternal()
throws IOException
IOExceptionprivate Token nextInput()
throws IOException
IOExceptionprotected boolean isStopWord(String word)
private Token glomToken(Token token1,
Token token2,
int increment)