public class LuceneLanguageModel extends BaseLanguageModel
LuceneSingleIndexLanguageModel
, but can merge the results of
lookups in several independent indexes to one result.GOOGLE_SENTENCE_END, GOOGLE_SENTENCE_START
Constructor and Description |
---|
LuceneLanguageModel(File topIndexDir) |
Modifier and Type | Method and Description |
---|---|
void |
close() |
long |
getCount(List<String> tokens)
Get the occurrence count for the given token sequence.
|
long |
getCount(String token)
Get the occurrence count for
token . |
long |
getTotalTokenCount() |
String |
toString() |
static void |
validateDirectory(File topIndexDir) |
getPseudoProbability, getPseudoProbabilityStupidBackoff
public LuceneLanguageModel(File topIndexDir)
topIndexDir
- a directory which contains either:
1) sub directories called 1grams
, 2grams
, 3grams
,
which are Lucene indexes with ngram occurrences as created by
org.languagetool.dev.FrequencyIndexCreator
or 2) sub directories index-1
, index-2
etc that contain
the sub directories described under 1)public static void validateDirectory(File topIndexDir)
public long getCount(List<String> tokens)
BaseLanguageModel
getCount
in class BaseLanguageModel
public long getCount(String token)
BaseLanguageModel
token
.getCount
in class BaseLanguageModel
public long getTotalTokenCount()
getTotalTokenCount
in class BaseLanguageModel
public void close()