Interface | Description |
---|---|
CompoundWordTokenizer |
Interface for components that take compound words and split
them into their parts.
|
SentenceTokenizer |
Tokenizes text into sentences.
|
Tokenizer |
Interface for classes that tokenize text into smaller units.
|
Class | Description |
---|---|
ArabicWordTokenizer | |
PersianWordTokenizer | |
SimpleSentenceTokenizer |
A very simple sentence tokenizer that splits on {@code [.!?
|
SRXSentenceTokenizer |
Class to tokenize sentences using rules from an SRX file.
|
WordTokenizer |
Tokenizes a sentence into words.
|