public class LuceneSingleIndexLanguageModel extends BaseLanguageModel
Modifier and Type | Class and Description |
---|---|
protected static class |
LuceneSingleIndexLanguageModel.LuceneSearcher |
GOOGLE_SENTENCE_END, GOOGLE_SENTENCE_START
Constructor and Description |
---|
LuceneSingleIndexLanguageModel(File topIndexDir) |
LuceneSingleIndexLanguageModel(int maxNgram) |
Modifier and Type | Method and Description |
---|---|
static void |
clearCaches()
Only used internally.
|
void |
close() |
protected void |
doValidateDirectory(File topIndexDir) |
long |
getCount(List<String> tokens)
Get the occurrence count for the given token sequence.
|
long |
getCount(String token1)
Get the occurrence count for
token . |
protected LuceneSingleIndexLanguageModel.LuceneSearcher |
getLuceneSearcher(int ngramSize) |
long |
getTotalTokenCount() |
String |
toString() |
static void |
validateDirectory(File topIndexDir)
Throw RuntimeException is the given directory does not seem to be a valid ngram top directory
with sub directories
1grams etc. |
getPseudoProbability, getPseudoProbabilityStupidBackoff
public LuceneSingleIndexLanguageModel(File topIndexDir)
topIndexDir
- a directory which contains at least another sub directory called 3grams
,
which is a Lucene index with ngram occurrences as created by
org.languagetool.dev.FrequencyIndexCreator
.public LuceneSingleIndexLanguageModel(int maxNgram)
public static void validateDirectory(File topIndexDir)
1grams
etc.@Experimental public static void clearCaches()
protected void doValidateDirectory(File topIndexDir)
public long getCount(List<String> tokens)
BaseLanguageModel
getCount
in class BaseLanguageModel
public long getCount(String token1)
BaseLanguageModel
token
.getCount
in class BaseLanguageModel
public long getTotalTokenCount()
getTotalTokenCount
in class BaseLanguageModel
protected LuceneSingleIndexLanguageModel.LuceneSearcher getLuceneSearcher(int ngramSize)
public void close()