public class LuceneSingleIndexLanguageModel extends BaseLanguageModel
| Modifier and Type | Class and Description |
|---|---|
protected static class |
LuceneSingleIndexLanguageModel.LuceneSearcher |
GOOGLE_SENTENCE_END, GOOGLE_SENTENCE_START| Constructor and Description |
|---|
LuceneSingleIndexLanguageModel(File topIndexDir) |
LuceneSingleIndexLanguageModel(int maxNgram) |
| Modifier and Type | Method and Description |
|---|---|
static void |
clearCaches()
Only used internally.
|
void |
close() |
protected void |
doValidateDirectory(File topIndexDir) |
long |
getCount(List<String> tokens)
Get the occurrence count for the given token sequence.
|
long |
getCount(String token1)
Get the occurrence count for
token. |
protected LuceneSingleIndexLanguageModel.LuceneSearcher |
getLuceneSearcher(int ngramSize) |
long |
getTotalTokenCount() |
String |
toString() |
static void |
validateDirectory(File topIndexDir)
Throw RuntimeException is the given directory does not seem to be a valid ngram top directory
with sub directories
1grams etc. |
getPseudoProbability, getPseudoProbabilityStupidBackoffpublic LuceneSingleIndexLanguageModel(File topIndexDir)
topIndexDir - a directory which contains at least another sub directory called 3grams,
which is a Lucene index with ngram occurrences as created by
org.languagetool.dev.FrequencyIndexCreator.public LuceneSingleIndexLanguageModel(int maxNgram)
public static void validateDirectory(File topIndexDir)
1grams etc.@Experimental public static void clearCaches()
protected void doValidateDirectory(File topIndexDir)
public long getCount(List<String> tokens)
BaseLanguageModelgetCount in class BaseLanguageModelpublic long getCount(String token1)
BaseLanguageModeltoken.getCount in class BaseLanguageModelpublic long getTotalTokenCount()
getTotalTokenCount in class BaseLanguageModelprotected LuceneSingleIndexLanguageModel.LuceneSearcher getLuceneSearcher(int ngramSize)
public void close()