Modifier and Type | Method and Description |
---|---|
Tokenizer |
Language.createDefaultWordTokenizer()
Creates language specific word tokenizer.
|
Tokenizer |
Language.getWordTokenizer()
Get this language's word tokenizer implementation.
|
Modifier and Type | Method and Description |
---|---|
void |
Language.setWordTokenizer(Tokenizer tokenizer)
Set this language's word tokenizer implementation.
|
Modifier and Type | Class and Description |
---|---|
class |
TagalogWordTokenizer |
Modifier and Type | Method and Description |
---|---|
Tokenizer |
NoopLanguage.createDefaultWordTokenizer() |
Modifier and Type | Class and Description |
---|---|
class |
GoogleStyleWordTokenizer
Tokenize sentences to tokens like Google does for its ngram index.
|
Modifier and Type | Method and Description |
---|---|
protected Tokenizer |
NgramProbabilityRule.getGoogleStyleWordTokenizer() |
Modifier and Type | Interface and Description |
---|---|
interface |
CompoundWordTokenizer
Interface for components that take compound words and split
them into their parts.
|
interface |
SentenceTokenizer
Tokenizes text into sentences.
|
Modifier and Type | Class and Description |
---|---|
class |
ArabicWordTokenizer |
class |
PersianWordTokenizer |
class |
SimpleSentenceTokenizer
A very simple sentence tokenizer that splits on {@code [.!?
|
class |
SRXSentenceTokenizer
Class to tokenize sentences using rules from an SRX file.
|
class |
WordTokenizer
Tokenizes a sentence into words.
|
Modifier and Type | Class and Description |
---|---|
class |
BelarusianWordTokenizer
Specific to Belarusian: apostrophes (', ’, ʼ) are part of the
word.
|
Modifier and Type | Class and Description |
---|---|
class |
BretonWordTokenizer |
Modifier and Type | Class and Description |
---|---|
class |
CatalanWordTokenizer
Tokenizes a sentence into words.
|
Modifier and Type | Class and Description |
---|---|
class |
GermanCompoundTokenizer
Split German nouns using the jWordSplitter library.
|
class |
GermanWordTokenizer |
Modifier and Type | Class and Description |
---|---|
class |
GreekWordTokenizer |
Modifier and Type | Class and Description |
---|---|
class |
EnglishWordTokenizer |
Modifier and Type | Class and Description |
---|---|
class |
EsperantoWordTokenizer |
Modifier and Type | Class and Description |
---|---|
class |
SpanishWordTokenizer
Tokenizes a sentence into words.
|
Modifier and Type | Class and Description |
---|---|
class |
FrenchWordTokenizer
Tokenizes a sentence into words.
|
Modifier and Type | Class and Description |
---|---|
class |
GalicianWordTokenizer
Tokenizes a sentence into words.
|
Modifier and Type | Class and Description |
---|---|
class |
JapaneseWordTokenizer |
Modifier and Type | Class and Description |
---|---|
class |
KhmerWordTokenizer
Tokenizes a sentence into words.
|
Modifier and Type | Class and Description |
---|---|
class |
MalayalamWordTokenizer
Tokenizes a sentence into words.
|
Modifier and Type | Class and Description |
---|---|
class |
DutchWordTokenizer |
Modifier and Type | Class and Description |
---|---|
class |
PolishWordTokenizer |
Modifier and Type | Class and Description |
---|---|
class |
PortugueseWordTokenizer
Tokenizes a sentence into words.
|
Modifier and Type | Class and Description |
---|---|
class |
RomanianWordTokenizer
Tokenizes a sentence into words.
|
Modifier and Type | Class and Description |
---|---|
class |
RussianWordTokenizer |
Modifier and Type | Class and Description |
---|---|
class |
UkrainianWordTokenizer
Tokenizes a sentence into words.
|
Modifier and Type | Class and Description |
---|---|
class |
ChineseSentenceTokenizer |
class |
ChineseWordTokenizer |