public class SentenceSourceIndexer extends DefaultHandler implements AutoCloseable
SentenceSource
.
Performance examples (Dell XPS 13 9360):
German Wikipedia and Tatoeba With POS tags: 22,000 sentences per minute
German Wikipedia and Tatoeba Without POS tags: 2.4 million sentences per minuteModifier and Type | Field and Description |
---|---|
static String |
MAX_DOC_COUNT_FIELD |
static String |
MAX_DOC_COUNT_FIELD_VAL |
static String |
MAX_DOC_COUNT_VALUE |
Modifier and Type | Method and Description |
---|---|
void |
close() |
static void |
main(String... args) |
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
public static final String MAX_DOC_COUNT_VALUE
public static final String MAX_DOC_COUNT_FIELD
public static final String MAX_DOC_COUNT_FIELD_VAL