public class LuceneLanguageModel extends BaseLanguageModel
LuceneSingleIndexLanguageModel, but can merge the results of
lookups in several independent indexes to one result.GOOGLE_SENTENCE_END, GOOGLE_SENTENCE_START| Constructor and Description |
|---|
LuceneLanguageModel(File topIndexDir) |
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
long |
getCount(List<String> tokens)
Get the occurrence count for the given token sequence.
|
long |
getCount(String token)
Get the occurrence count for
token. |
long |
getTotalTokenCount() |
String |
toString() |
static void |
validateDirectory(File topIndexDir) |
getPseudoProbability, getPseudoProbabilityStupidBackoffpublic LuceneLanguageModel(File topIndexDir)
topIndexDir - a directory which contains either:
1) sub directories called 1grams, 2grams, 3grams,
which are Lucene indexes with ngram occurrences as created by
org.languagetool.dev.FrequencyIndexCreator
or 2) sub directories index-1, index-2 etc that contain
the sub directories described under 1)public static void validateDirectory(File topIndexDir)
public long getCount(List<String> tokens)
BaseLanguageModelgetCount in class BaseLanguageModelpublic long getCount(String token)
BaseLanguageModeltoken.getCount in class BaseLanguageModelpublic long getTotalTokenCount()
getTotalTokenCount in class BaseLanguageModelpublic void close()