All Implemented Interfaces:
Direct Known Subclasses:
ArabicWordTokenizer, BelarusianWordTokenizer, BretonWordTokenizer, CatalanWordTokenizer, DutchWordTokenizer, EnglishWordTokenizer, EsperantoWordTokenizer, FrenchWordTokenizer, GalicianWordTokenizer, GermanWordTokenizer, GoogleStyleWordTokenizer, GreekWordTokenizer, KhmerWordTokenizer, PersianWordTokenizer, PolishWordTokenizer, PortugueseWordTokenizer, RomanianWordTokenizer, RussianWordTokenizer, SpanishWordTokenizer, TagalogWordTokenizer
Tokenizes a sentence into words. Punctuation and whitespace gets their own tokens.
The tokenizer is a quite simple character-based one, though it knows
about urls and will put them in one token, if fully specified including
a protocol (like
Author: Daniel Naber
Methods inherited from class java.lang.
Object clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait