public abstract class SpellingCheckRule extends Rule
Modifier and Type | Field and Description |
---|---|
protected static String |
CUSTOM_SPELLING_FILE |
protected static String |
GLOBAL_SPELLING_FILE |
static float |
HIGH_CONFIDENCE
The confidence value for a suggestion with high confidence.
|
protected int |
ignoreWordsWithLength |
protected Language |
language |
protected LanguageModel |
languageModel |
static String |
LANGUAGETOOL
The string
LanguageTool . |
static String |
LANGUAGETOOLER
The string
LanguageTooler . |
static int |
MAX_TOKEN_LENGTH |
protected CachingWordListLoader |
wordListLoader |
protected Set<String> |
wordsToBeIgnored |
Constructor and Description |
---|
SpellingCheckRule(ResourceBundle messages,
Language language,
UserConfig userConfig) |
SpellingCheckRule(ResourceBundle messages,
Language language,
UserConfig userConfig,
List<Language> altLanguages) |
SpellingCheckRule(ResourceBundle messages,
Language language,
UserConfig userConfig,
List<Language> altLanguages,
LanguageModel languageModel) |
Modifier and Type | Method and Description |
---|---|
void |
acceptPhrases(List<String> phrases)
Accept (case-sensitively, unless at the start of a sentence) the given phrases even though they
are not in the built-in dictionary.
|
void |
addIgnoreTokens(List<String> tokens)
Add the given words to the list of words to be ignored during spell check.
|
protected void |
addIgnoreWords(String line) |
protected void |
addProhibitedWords(List<String> words) |
protected static void |
addSuggestionsToRuleMatch(String word,
List<SuggestedReplacement> userCandidatesList,
List<SuggestedReplacement> candidatesList,
SuggestionsOrderer orderer,
RuleMatch match) |
protected RuleMatch |
createWrongSplitMatch(AnalyzedSentence sentence,
List<RuleMatch> ruleMatchesSoFar,
int pos,
String coveredWord,
String suggestion1,
String suggestion2,
int prevPos) |
protected List<String> |
expandLine(String line)
Expand suffixes in a line.
|
protected static <T> List<T> |
filterDupes(List<T> words) |
protected List<SuggestedReplacement> |
filterNoSuggestWords(List<SuggestedReplacement> l) |
protected List<SuggestedReplacement> |
filterSuggestions(List<SuggestedReplacement> suggestions)
Remove prohibited words from suggestions.
|
protected List<String> |
getAdditionalProhibitFileNames()
Get the name of the prohibit file, which lists words not to be accepted, even
when the spell checker would accept them.
|
List<String> |
getAdditionalSpellingFileNames()
Get the name of additional spelling file, which lists words to be accepted
and used for suggestions, even when the spell checker would not accept them.
|
protected List<SuggestedReplacement> |
getAdditionalSuggestions(List<SuggestedReplacement> suggestions,
String word)
Get additional suggestions added after other suggestions (note the rule may choose to
re-order the suggestions anyway).
|
protected List<SuggestedReplacement> |
getAdditionalTopSuggestions(List<SuggestedReplacement> suggestions,
String word)
Get additional suggestions added before other suggestions (note the rule may choose to
re-order the suggestions anyway).
|
List<DisambiguationPatternRule> |
getAntiPatterns()
Overwrite this to avoid false alarms by ignoring these patterns -
note that your
Rule.match(AnalyzedSentence) method needs to
call Rule.getSentenceWithImmunization(org.languagetool.AnalyzedSentence) for this to be used
and you need to check AnalyzedTokenReadings.isImmunized() |
abstract String |
getDescription()
A short description of the error this rule can detect, usually in the language of the text
that is checked.
|
abstract String |
getId()
A string used to identify the rule in e.g. configuration files.
|
protected String |
getIgnoreFileName()
Get the name of the ignore file, which lists words to be accepted, even
when the spell checker would not accept them.
|
String |
getLanguageVariantSpellingFileName()
Get the name of the spelling file for a language variant (e.g., en-US or de-AT),
which lists words to be accepted and used for suggestions, even when the spell
checker would not accept them.
|
protected List<SuggestedReplacement> |
getOnlySuggestions(String word)
Get suggestions that will replace all other suggestions.
|
protected String |
getProhibitFileName()
Get the name of the prohibit file, which lists words not to be accepted, even
when the spell checker would accept them.
|
String |
getSpellingFileName()
Get the name of the spelling file, which lists words to be accepted
and used for suggestions, even when the spell checker would not accept them.
|
protected boolean |
ignorePotentiallyMisspelledWord(String words)
Like
ignoreWord(String) , but will only be called after the standard spell
check has run and considered this word to be incorrect. |
protected boolean |
ignoreToken(AnalyzedTokenReadings[] tokens,
int idx)
Returns true iff the token at the given position should be ignored by the spell checker.
|
protected boolean |
ignoreWord(List<String> words,
int idx)
Returns true iff the word at the given position should be ignored by the spell checker.
|
protected boolean |
ignoreWord(String word)
Returns true iff the word should be ignored by the spell checker.
|
protected void |
init() |
boolean |
isDictionaryBasedSpellingRule()
Whether this is a spelling rule that uses a dictionary.
|
protected static boolean |
isEMail(String token) |
protected boolean |
isIgnoredNoCase(String word) |
protected boolean |
isInIgnoredSet(String word) |
protected boolean |
isLatinScript() |
abstract boolean |
isMisspelled(String word) |
protected boolean |
isProhibited(String word)
Whether the word is prohibited, i.e. whether it should be marked as a spelling
error even if the spell checker would accept it.
|
protected static boolean |
isUrl(String token) |
abstract RuleMatch[] |
match(AnalyzedSentence sentence)
Check whether the given sentence matches this error rule, i.e. whether it
contains the error detected by this rule.
|
void |
setConsiderIgnoreWords(boolean considerIgnoreWords)
Set whether the list of words to be explicitly ignored (set with
addIgnoreTokens(List) ) is considered at all. |
void |
setConvertsCase(boolean convertsCase)
Used to determine whether the dictionary will use case conversions for
spell checking.
|
protected int |
startsWithIgnoredWord(String word,
boolean caseSensitive)
Checks whether a
word starts with an ignored word. |
protected boolean |
tokenizeNewWords() |
addExamplePair, addTags, addToneTags, cacheAntiPatterns, estimateContextForSureMatch, getCategory, getConfigureText, getCorrectExamples, getDefaultValue, getDistanceTokens, getErrorTriggeringExamples, getFullId, getIncorrectExamples, getLocQualityIssueType, getMaxConfigurableValue, getMinConfigurableValue, getMinPrevMatches, getSentenceWithImmunization, getSourceFile, getSubId, getTags, getToneTags, getUrl, hasConfigurableValue, hasTag, hasToneTag, isDefaultOff, isDefaultTempOff, isGoalSpecific, isOfficeDefaultOff, isOfficeDefaultOn, isPremium, makeAntiPatterns, setCategory, setCorrectExamples, setDefaultOff, setDefaultOn, setDefaultTempOff, setDistanceTokens, setErrorTriggeringExamples, setExamplePair, setGoalSpecific, setIncorrectExamples, setLocQualityIssueType, setMinPrevMatches, setOfficeDefaultOff, setOfficeDefaultOn, setPremium, setTags, setToneTags, setUrl, supportsLanguage, toRuleMatchArray, useInOffice
public static final float HIGH_CONFIDENCE
public static final String LANGUAGETOOL
LanguageTool
.public static final String LANGUAGETOOLER
LanguageTooler
.public static final int MAX_TOKEN_LENGTH
protected final Language language
@Nullable protected LanguageModel languageModel
protected final CachingWordListLoader wordListLoader
protected static final String CUSTOM_SPELLING_FILE
protected static final String GLOBAL_SPELLING_FILE
protected int ignoreWordsWithLength
public SpellingCheckRule(ResourceBundle messages, Language language, UserConfig userConfig)
public SpellingCheckRule(ResourceBundle messages, Language language, UserConfig userConfig, List<Language> altLanguages)
public SpellingCheckRule(ResourceBundle messages, Language language, UserConfig userConfig, List<Language> altLanguages, @Nullable LanguageModel languageModel)
protected static void addSuggestionsToRuleMatch(String word, List<SuggestedReplacement> userCandidatesList, List<SuggestedReplacement> candidatesList, @Nullable SuggestionsOrderer orderer, RuleMatch match)
word
- misspelled word that suggestions should be generated foruserCandidatesList
- candidates from personal dictionarycandidatesList
- candidates from default dictionaryorderer
- model to rank suggestions / extract features, or nullmatch
- rule match to add suggestions toprotected RuleMatch createWrongSplitMatch(AnalyzedSentence sentence, List<RuleMatch> ruleMatchesSoFar, int pos, String coveredWord, String suggestion1, String suggestion2, int prevPos)
public abstract String getId()
Rule
A-Z
and the underscore.public abstract String getDescription()
Rule
getDescription
in class Rule
public abstract RuleMatch[] match(AnalyzedSentence sentence) throws IOException
Rule
match
in class Rule
sentence
- a pre-analyzed sentenceRuleMatch
objectsIOException
@Experimental public abstract boolean isMisspelled(String word) throws IOException
IOException
public boolean isDictionaryBasedSpellingRule()
Rule
true
here are basically rules that work like
a simple hunspell-like spellchecker: they check words without considering
the words' context.isDictionaryBasedSpellingRule
in class Rule
public void addIgnoreTokens(List<String> tokens)
acceptPhrases(List)
instead, as only that
can also deal with phrases.public void setConsiderIgnoreWords(boolean considerIgnoreWords)
addIgnoreTokens(List)
) is considered at all.protected List<SuggestedReplacement> getAdditionalTopSuggestions(List<SuggestedReplacement> suggestions, String word) throws IOException
IOException
protected List<SuggestedReplacement> getOnlySuggestions(String word)
protected List<SuggestedReplacement> getAdditionalSuggestions(List<SuggestedReplacement> suggestions, String word)
protected boolean ignoreToken(AnalyzedTokenReadings[] tokens, int idx) throws IOException
ignorePotentiallyMisspelledWord(String)
if the check you want to implement is slightly
computationally expensive.IOException
protected boolean ignoreWord(String word) throws IOException
ignoreToken(AnalyzedTokenReadings[], int)
instead.IOException
protected boolean isInIgnoredSet(String word)
protected boolean isIgnoredNoCase(String word)
protected boolean ignoreWord(List<String> words, int idx) throws IOException
ignoreToken(AnalyzedTokenReadings[], int)
instead.IOException
protected boolean ignorePotentiallyMisspelledWord(String words) throws IOException
ignoreWord(String)
, but will only be called after the standard spell
check has run and considered this word to be incorrect. This way, tests run here
can be a bit more computationally expensive.IOException
public void setConvertsCase(boolean convertsCase)
convertsCase
- if true, then conversions are used.protected static boolean isUrl(String token)
protected static boolean isEMail(String token)
protected void init() throws IOException
IOException
protected String getIgnoreFileName()
getSpellingFileName()
the words in this file will not be used for creating suggestions for misspelled words.public String getSpellingFileName()
public List<String> getAdditionalSpellingFileNames()
public String getLanguageVariantSpellingFileName()
protected String getProhibitFileName()
protected List<String> getAdditionalProhibitFileNames()
protected boolean isProhibited(String word)
protected List<SuggestedReplacement> filterSuggestions(List<SuggestedReplacement> suggestions)
protected List<SuggestedReplacement> filterNoSuggestWords(List<SuggestedReplacement> l)
protected void addIgnoreWords(String line)
line
- the line as read from spelling.txt
.protected void addProhibitedWords(List<String> words)
words
- list of words to be prohibited.protected List<String> expandLine(String line)
bicycle/S
into [bicycle, bicycles]
.public void acceptPhrases(List<String> phrases)
addIgnoreTokens(List)
this can deal with phrases. A way to call this is like this:
rule.acceptPhrases(Arrays.asList("duodenal atresia"))
This way, checking would not create an error for "duodenal atresia", but it would still
create and error for "duodenal" or "atresia" if they appear on their own.public List<DisambiguationPatternRule> getAntiPatterns()
Rule
Rule.match(AnalyzedSentence)
method needs to
call Rule.getSentenceWithImmunization(org.languagetool.AnalyzedSentence)
for this to be used
and you need to check AnalyzedTokenReadings.isImmunized()
getAntiPatterns
in class Rule
protected int startsWithIgnoredWord(String word, boolean caseSensitive)
word
starts with an ignored word.
Note that a minimum word
-length of 4 characters is expected.
(This is for better performance. Moreover, such short words are most likely contained in the dictionary.)word
- - entire wordcaseSensitive
- - determines whether the check is case-sensitiveprotected boolean tokenizeNewWords()
protected boolean isLatinScript()