public abstract class Language extends Object
META-INF/org/languagetool/language-module.properties
. Those file(s)
need to contain a key languageClasses
which specifies the fully qualified
class name(s), e.g. org.languagetool.language.English
. Use commas to specify
more than one class.
Sub classes should typically use lazy init for anything that's costly to set up. This improves start up time for the LanguageTool stand-alone version.
Constructor and Description |
---|
Language() |
Modifier and Type | Method and Description |
---|---|
String |
adaptSuggestion(String s) |
List<RuleMatch> |
adaptSuggestions(List<RuleMatch> ruleMatches,
Set<String> enabledRules) |
RuleMatch |
adjustMatch(RuleMatch rm,
List<String> features) |
Chunker |
createDefaultChunker()
Creates language specific chunker.
|
Disambiguator |
createDefaultDisambiguator()
Creates language specific disambiguator.
|
JLanguageTool |
createDefaultJLanguageTool()
Create a shared instance of JLanguageTool to use in rules for further processing
Instances are shared by Language
|
Chunker |
createDefaultPostDisambiguationChunker()
Creates language specific post disambiguation chunker.
|
SentenceTokenizer |
createDefaultSentenceTokenizer()
Creates language specific sentence tokenizer.
|
protected SpellingCheckRule |
createDefaultSpellingRule(ResourceBundle messages) |
Synthesizer |
createDefaultSynthesizer()
Creates language specific part-of-speech synthesizer.
|
Tagger |
createDefaultTagger()
Creates language specific part-of-speech tagger.
|
Tokenizer |
createDefaultWordTokenizer()
Creates language specific word tokenizer.
|
boolean |
equals(Object o)
Considers languages as equal if their language code, including the country and variant codes are equal.
|
boolean |
equalsConsiderVariantsIfSpecified(Language otherLanguage)
Return true if this is the same language as the given one, considering country
variants only if set for both languages.
|
Chunker |
getChunker()
Get this language's chunker implementation or
null . |
String |
getClosingDoubleQuote() |
String |
getClosingSingleQuote() |
String |
getCommonWordsPath()
A file with commons words, either in the classpath or as a filename in the file system.
|
String |
getConsistencyRulePrefix() |
abstract String[] |
getCountries()
Get this language's country options , e.g.
|
List<String> |
getDefaultDisabledRulesForVariant()
Get disabled rules different from the default ones for this language variant.
|
List<String> |
getDefaultEnabledRulesForVariant()
Get enabled rules different from the default ones for this language variant.
|
Language |
getDefaultLanguageVariant()
Languages that have country variants need to overwrite this to select their most common variant.
|
SpellingCheckRule |
getDefaultSpellingRule()
Retrieve default spelling rule for this language
Useful for rules to implement suppression of misspelled suggestions
|
SpellingCheckRule |
getDefaultSpellingRule(ResourceBundle messages)
Deprecated.
|
Unifier |
getDisambiguationUnifier()
Get this language's feature unifier used for disambiguation.
|
UnifierConfiguration |
getDisambiguationUnifierConfiguration() |
Disambiguator |
getDisambiguator()
Get this language's part-of-speech disambiguator implementation.
|
Pattern |
getIgnoredCharactersRegex() |
LanguageModel |
getLanguageModel(File indexDir) |
Locale |
getLocale()
Get this language's Java locale, not considering the country code.
|
Locale |
getLocaleWithCountryAndVariant()
Get this language's Java locale, considering language code and country code (if any).
|
LanguageMaintainedState |
getMaintainedState()
Information about whether the support for this language in LanguageTool is actively maintained.
|
abstract Contributor[] |
getMaintainers()
Get the name(s) of the maintainer(s) for this language or
null . |
abstract String |
getName()
Get this language's name in English, e.g.
|
String |
getOpeningDoubleQuote() |
String |
getOpeningSingleQuote() |
protected List<AbstractPatternRule> |
getPatternRules()
Get the pattern rules as defined in the files returned by
getRuleFileNames() . |
Chunker |
getPostDisambiguationChunker()
Get this language's post disambiguation chunker implementation or
null . |
protected int |
getPriorityForId(String id)
Returns a priority for Rule or Category Id (default: 0).
|
List<Rule> |
getRelevantLanguageModelCapableRules(ResourceBundle messages,
LanguageModel languageModel,
GlobalConfig globalConfig,
UserConfig userConfig,
Language motherTongue,
List<Language> altLanguages)
Get a list of rules that can optionally use a
LanguageModel . |
List<Rule> |
getRelevantLanguageModelRules(ResourceBundle messages,
LanguageModel languageModel,
UserConfig userConfig)
Get a list of rules that require a
LanguageModel . |
List<Rule> |
getRelevantRemoteRules(ResourceBundle messageBundle,
List<RemoteRuleConfig> configs,
GlobalConfig globalConfig,
UserConfig userConfig,
Language motherTongue,
List<Language> altLanguages,
boolean inputLogging)
For rules that depend on a remote server; based on
RemoteRule
will be executed asynchronously, with timeout, retries, etc. |
abstract List<Rule> |
getRelevantRules(ResourceBundle messages,
UserConfig userConfig,
Language motherTongue,
List<Language> altLanguages)
Get the rules classes that should run for texts in this language.
|
List<Rule> |
getRelevantRulesGlobalConfig(ResourceBundle messages,
GlobalConfig globalConfig,
UserConfig userConfig,
Language motherTongue,
List<Language> altLanguages)
Get the rules classes that should run for texts in this language.
|
Function<Rule,Rule> |
getRemoteEnhancedRules(ResourceBundle messageBundle,
List<RemoteRuleConfig> configs,
UserConfig userConfig,
Language motherTongue,
List<Language> altLanguages,
boolean inputLogging)
For rules whose results are extended using some remote service, e.g.
|
List<String> |
getRuleFileNames()
Get the location of the rule file(s) in a form like
/org/languagetool/rules/de/grammar.xml ,
i.e. a path in the classpath. |
int |
getRulePriority(Rule rule)
Returns a priority for Rule (default: 0).
|
SentenceTokenizer |
getSentenceTokenizer()
Get this language's sentence tokenizer implementation.
|
abstract String |
getShortCode()
Get this language's character code, e.g.
|
String |
getShortCodeWithCountryAndVariant()
Get the short name of the language with country and variant (if any), if it is
a single-country language.
|
Synthesizer |
getSynthesizer()
Get this language's part-of-speech synthesizer implementation or
null . |
Tagger |
getTagger()
Get this language's part-of-speech tagger implementation.
|
String |
getTranslatedName(ResourceBundle messages)
Get the name of the language translated to the current locale,
if available.
|
Unifier |
getUnifier()
Get this language's feature unifier.
|
UnifierConfiguration |
getUnifierConfiguration() |
String |
getVariant()
Get this language's variant, e.g.
|
Tokenizer |
getWordTokenizer()
Get this language's word tokenizer implementation.
|
int |
hashCode() |
boolean |
hasMinMatchesRules() |
boolean |
hasNGramFalseFriendRule(Language motherTongue)
Return true if language has ngram-based false friend rule returned by
getRelevantLanguageModelCapableRules(java.util.ResourceBundle, org.languagetool.languagemodel.LanguageModel, org.languagetool.GlobalConfig, org.languagetool.UserConfig, org.languagetool.Language, java.util.List<org.languagetool.Language>) . |
boolean |
hasVariant()
Whether this class has at least one subclass that implements variants of this language.
|
protected LanguageModel |
initLanguageModel(File indexDir,
LanguageModel languageModel) |
boolean |
isAdvancedTypographyEnabled() |
boolean |
isExternal()
For internal use only.
|
boolean |
isHiddenFromGui() |
boolean |
isSpellcheckOnlyLanguage()
Whether this language supports spell checking only and
no advanced grammar and style checking.
|
boolean |
isVariant()
|
List<RuleMatch> |
mergeSuggestions(List<RuleMatch> ruleMatches,
AnnotatedText text,
Set<String> enabledRules)
This function is called by JLanguageTool before CleanOverlappingFilter removes overlapping ruleMatches
|
void |
setChunker(Chunker chunker)
Set this language's chunker implementation or
null . |
void |
setDisambiguator(Disambiguator disambiguator)
Set this language's part-of-speech disambiguator implementation.
|
void |
setPostDisambiguationChunker(Chunker chunker)
Set this language's post disambiguation chunker implementation or
null . |
void |
setSentenceTokenizer(SentenceTokenizer tokenizer)
Set this language's sentence tokenizer implementation.
|
void |
setSynthesizer(Synthesizer synthesizer)
Set this language's part-of-speech synthesizer implementation or
null . |
void |
setTagger(Tagger tagger)
Set this language's part-of-speech tagger implementation.
|
void |
setWordTokenizer(Tokenizer tokenizer)
Set this language's word tokenizer implementation.
|
String |
toAdvancedTypography(String input) |
String |
toString() |
public abstract String getShortCode()
en
for English.
For most languages this is a two-letter code according to ISO 639-1,
but for those languages that don't have a two-letter code, a three-letter
code according to ISO 639-2 is returned.
The country parameter (e.g. "US"), if any, is not returned.public abstract String getName()
English
or
German (Germany)
.public abstract String[] getCountries()
US
(as in en-US
) or
PL
(as in pl-PL
).@Nullable public abstract Contributor[] getMaintainers()
null
.public abstract List<Rule> getRelevantRules(ResourceBundle messages, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) throws IOException
IOException
public String getCommonWordsPath()
@Nullable public String getVariant()
valencia
(as in ca-ES-valencia
)
or null
.
Attention: not to be confused with "country" optionnull
public List<String> getDefaultEnabledRulesForVariant()
public List<String> getDefaultDisabledRulesForVariant()
@Nullable public LanguageModel getLanguageModel(File indexDir) throws IOException
indexDir
- directory with a '3grams' sub directory which contains a Lucene index with 3gram occurrence countsnull
if this language doesn't support oneIOException
protected LanguageModel initLanguageModel(File indexDir, LanguageModel languageModel)
public List<Rule> getRelevantLanguageModelRules(ResourceBundle messages, LanguageModel languageModel, UserConfig userConfig) throws IOException
LanguageModel
. Returns an empty list for
languages that don't have such rules.IOException
public List<Rule> getRelevantLanguageModelCapableRules(ResourceBundle messages, @Nullable LanguageModel languageModel, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) throws IOException
LanguageModel
. Returns an empty list for
languages that don't have such rules.languageModel
- null if no language model is availableIOException
public List<Rule> getRelevantRemoteRules(ResourceBundle messageBundle, List<RemoteRuleConfig> configs, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages, boolean inputLogging) throws IOException
RemoteRule
will be executed asynchronously, with timeout, retries, etc. as configured
Can return non-remote rules (e.g. if configuration missing, or for A/B tests), will be executed normallyIOException
@Experimental public Function<Rule,Rule> getRemoteEnhancedRules(ResourceBundle messageBundle, List<RemoteRuleConfig> configs, UserConfig userConfig, Language motherTongue, List<Language> altLanguages, boolean inputLogging) throws IOException
BERTSuggestionRanking
IOException
public List<Rule> getRelevantRulesGlobalConfig(ResourceBundle messages, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) throws IOException
IOException
@Nullable protected SpellingCheckRule createDefaultSpellingRule(ResourceBundle messages) throws IOException
IOException
@Nullable public SpellingCheckRule getDefaultSpellingRule()
@Deprecated public SpellingCheckRule getDefaultSpellingRule(ResourceBundle messages)
getDefaultSpellingRule()
messages
- unusedpublic Locale getLocale()
public Locale getLocaleWithCountryAndVariant()
public List<String> getRuleFileNames()
/org/languagetool/rules/de/grammar.xml
,
i.e. a path in the classpath. The files must exist or an exception will be thrown, unless the filename
contains the string -test-
.@NotNull public Language getDefaultLanguageVariant()
public Disambiguator createDefaultDisambiguator()
getDisambiguator()
if disambiguator is not set.public Disambiguator getDisambiguator()
public void setDisambiguator(Disambiguator disambiguator)
@NotNull public Tagger createDefaultTagger()
null
,
but it can be a trivial pseudo-tagger that only assigns null
tags.
This function will be called each time in getTagger()
()} if tagger is not set.@NotNull public Tagger getTagger()
public void setTagger(Tagger tagger)
public SentenceTokenizer createDefaultSentenceTokenizer()
getSentenceTokenizer()
if sentence tokenizer is not set.public SentenceTokenizer getSentenceTokenizer()
public void setSentenceTokenizer(SentenceTokenizer tokenizer)
public Tokenizer createDefaultWordTokenizer()
getWordTokenizer()
if word tokenizer is not set.public Tokenizer getWordTokenizer()
public void setWordTokenizer(Tokenizer tokenizer)
@Nullable public Chunker createDefaultChunker()
getChunker()
if chunker is not set.@Nullable public Chunker getChunker()
null
.public void setChunker(Chunker chunker)
null
.@Nullable public Chunker createDefaultPostDisambiguationChunker()
getPostDisambiguationChunker()
if chunker is not set.@Nullable public Chunker getPostDisambiguationChunker()
null
.public void setPostDisambiguationChunker(Chunker chunker)
null
.public JLanguageTool createDefaultJLanguageTool()
@Nullable public Synthesizer createDefaultSynthesizer()
getSynthesizer()
if synthesizer is not set.@Nullable public Synthesizer getSynthesizer()
null
.public void setSynthesizer(Synthesizer synthesizer)
null
.public Unifier getUnifier()
public Unifier getDisambiguationUnifier()
public UnifierConfiguration getUnifierConfiguration()
public UnifierConfiguration getDisambiguationUnifierConfiguration()
public final String getTranslatedName(ResourceBundle messages)
public final String getShortCodeWithCountryAndVariant()
protected List<AbstractPatternRule> getPatternRules() throws IOException
getRuleFileNames()
.IOException
public boolean isVariant()
Language
, but a subclass of Language
.public final boolean hasVariant()
public boolean isExternal()
true
for languages that
have been loaded from an external file after start up.public boolean equalsConsiderVariantsIfSpecified(Language otherLanguage)
public Pattern getIgnoredCharactersRegex()
public LanguageMaintainedState getMaintainedState()
public boolean isHiddenFromGui()
protected int getPriorityForId(String id)
public int getRulePriority(Rule rule)
public boolean isSpellcheckOnlyLanguage()
public boolean hasNGramFalseFriendRule(Language motherTongue)
getRelevantLanguageModelCapableRules(java.util.ResourceBundle, org.languagetool.languagemodel.LanguageModel, org.languagetool.GlobalConfig, org.languagetool.UserConfig, org.languagetool.Language, java.util.List<org.languagetool.Language>)
.public String getOpeningDoubleQuote()
public String getClosingDoubleQuote()
public String getOpeningSingleQuote()
public String getClosingSingleQuote()
public boolean isAdvancedTypographyEnabled()
public boolean equals(Object o)
public boolean hasMinMatchesRules()
public List<RuleMatch> adaptSuggestions(List<RuleMatch> ruleMatches, Set<String> enabledRules)
public String getConsistencyRulePrefix()
public List<RuleMatch> mergeSuggestions(List<RuleMatch> ruleMatches, AnnotatedText text, Set<String> enabledRules)
ruleMatches
- text
- enabledRules
-