| Modifier and Type | Class and Description |
|---|---|
class |
TagalogWordTokenizer |
| Modifier and Type | Class and Description |
|---|---|
class |
GoogleStyleWordTokenizer
Tokenize sentences to tokens like Google does for its ngram index.
|
| Modifier and Type | Method and Description |
|---|---|
protected WordTokenizer |
EnglishNgramProbabilityRule.getGoogleStyleWordTokenizer() |
| Modifier and Type | Class and Description |
|---|---|
class |
ArabicWordTokenizer |
class |
PersianWordTokenizer |
| Modifier and Type | Class and Description |
|---|---|
class |
BelarusianWordTokenizer
Specific to Belarusian: apostrophes (', ’, ʼ) are part of the
word.
|
| Modifier and Type | Class and Description |
|---|---|
class |
BretonWordTokenizer |
| Modifier and Type | Class and Description |
|---|---|
class |
CatalanWordTokenizer
Tokenizes a sentence into words.
|
| Modifier and Type | Class and Description |
|---|---|
class |
GermanWordTokenizer |
| Modifier and Type | Class and Description |
|---|---|
class |
GreekWordTokenizer |
| Modifier and Type | Class and Description |
|---|---|
class |
EnglishWordTokenizer |
| Modifier and Type | Class and Description |
|---|---|
class |
EsperantoWordTokenizer |
| Modifier and Type | Class and Description |
|---|---|
class |
SpanishWordTokenizer
Tokenizes a sentence into words.
|
| Modifier and Type | Class and Description |
|---|---|
class |
FrenchWordTokenizer
Tokenizes a sentence into words.
|
| Modifier and Type | Class and Description |
|---|---|
class |
GalicianWordTokenizer
Tokenizes a sentence into words.
|
| Modifier and Type | Class and Description |
|---|---|
class |
KhmerWordTokenizer
Tokenizes a sentence into words.
|
| Modifier and Type | Class and Description |
|---|---|
class |
DutchWordTokenizer |
| Modifier and Type | Class and Description |
|---|---|
class |
PolishWordTokenizer |
| Modifier and Type | Class and Description |
|---|---|
class |
PortugueseWordTokenizer
Tokenizes a sentence into words.
|
| Modifier and Type | Class and Description |
|---|---|
class |
RomanianWordTokenizer
Tokenizes a sentence into words.
|
| Modifier and Type | Class and Description |
|---|---|
class |
RussianWordTokenizer |