| Interface | Description |
|---|---|
| CompoundWordTokenizer |
Interface for components that take compound words and split
them into their parts.
|
| SentenceTokenizer |
Tokenizes text into sentences.
|
| Tokenizer |
Interface for classes that tokenize text into smaller units.
|
| Class | Description |
|---|---|
| ArabicWordTokenizer | |
| PersianWordTokenizer | |
| SimpleSentenceTokenizer |
A very simple sentence tokenizer that splits on {@code [.!?
|
| SRXSentenceTokenizer |
Class to tokenize sentences using rules from an SRX file.
|
| WordTokenizer |
Tokenizes a sentence into words.
|