BaseTagger (LanguageTool 6.4-SNAPSHOT API)

java.lang.Object
- org.languagetool.tagging.BaseTagger

All Implemented Interfaces:

Tagger

Direct Known Subclasses:

ArabicTagger, AsturianTagger, BretonTagger, CatalanTagger, DanishTagger, DutchTagger, EnglishTagger, FrenchTagger, GalicianTagger, GermanTagger, GreekTagger, IrishTagger, ItalianTagger, KhmerTagger, MalayalamTagger, PolishTagger, PortugueseTagger, RomanianTagger, RussianTagger, SlovakTagger, SpanishTagger, SwedishTagger, TagalogTagger, TamilTagger, UkrainianTagger
```
public abstract class BaseTagger
extends Object
implements Tagger
```
Base tagger using Morfologik binary dictionaries.

Author:

Marcin Milkowski

Field Summary

Fields
Modifier and Type Field and Description

protected Locale locale

protected WordTagger wordTagger

Fields
Modifier and Type	Field and Description
`protected Locale`	`locale`
`protected WordTagger`	`wordTagger`

Constructor Summary

Constructors
Constructor and Description
`BaseTagger(String filename, Locale locale)`
`BaseTagger(String filename, Locale locale, boolean tagLowercaseWithUppercase)`
`BaseTagger(String filename, Locale locale, boolean tagLowercaseWithUppercase, boolean internTags)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected List<AnalyzedToken>`	`additionalTags(String word, WordTagger wordTagger)` Allows additional tagging in some language-dependent circumstances
`protected AnalyzedToken`	`asAnalyzedToken(String word, morfologik.stemming.WordData wd)`
`protected List<AnalyzedToken>`	`asAnalyzedTokenList(String word, List<morfologik.stemming.WordData> wdList)`
`protected List<AnalyzedToken>`	`asAnalyzedTokenListForTaggedWords(String word, List<TaggedWord> taggedWords)`
`AnalyzedTokenReadings`	`createNullToken(String token, int startPos)` Create the AnalyzedToken used for whitespace and other non-words.
`AnalyzedToken`	`createToken(String token, String posTag)` Create a token specific to the language of the implementing class.
`protected List<AnalyzedToken>`	`getAnalyzedTokens(String word)`
`protected morfologik.stemming.Dictionary`	`getDictionary()`
`String`	`getDictionaryPath()`
`List<String>`	`getManualAdditionsFileNames()` Get the filenames for manual additions, e.g., `/en/added.txt`.
`List<String>`	`getManualRemovalsFileNames()` Get the filenames for manual removals, e.g., `/en/removed.txt`.
`protected WordTagger`	`getWordTagger()`
`boolean`	`overwriteWithManualTagger()` If true, tags from the binary dictionary (*.dict) will be overwritten by manual tags from the plain text dictionary.
`List<AnalyzedTokenReadings>`	`tag(List<String> sentenceTokens)` Returns a list of `AnalyzedToken`s that assigns each term in the sentence some kind of part-of-speech information (not necessarily just one tag).

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

wordTagger
```
protected final WordTagger wordTagger
```

locale
```
protected final Locale locale
```

Constructor Detail

BaseTagger

public BaseTagger(String filename,
                  Locale locale)

Since:: 2.9

BaseTagger

public BaseTagger(String filename,
                  Locale locale,
                  boolean tagLowercaseWithUppercase)

Since:: 2.9

BaseTagger

public BaseTagger(String filename,
                  Locale locale,
                  boolean tagLowercaseWithUppercase,
                  boolean internTags)

Parameters:: internTags - true if string tags should be interned
Since:: 4.9

Method Detail

getManualAdditionsFileNames
```
@NotNull
public List<String> getManualAdditionsFileNames()
```
Get the filenames for manual additions, e.g., /en/added.txt.

Since:

5.0

getManualRemovalsFileNames
```
@NotNull
public List<String> getManualRemovalsFileNames()
```
Get the filenames for manual removals, e.g., /en/removed.txt.

Since:

5.0

getDictionaryPath
```
public String getDictionaryPath()
```
Since:

2.9

overwriteWithManualTagger
```
public boolean overwriteWithManualTagger()
```
If true, tags from the binary dictionary (*.dict) will be overwritten by manual tags from the plain text dictionary.

Since:

2.9

getWordTagger
```
protected WordTagger getWordTagger()
```

getDictionary

protected morfologik.stemming.Dictionary getDictionary()

tag
```
public List<AnalyzedTokenReadings> tag(List<String> sentenceTokens)
                                throws IOException
```
Description copied from interface: Tagger

Returns a list of AnalyzedTokens that assigns each term in the sentence some kind of part-of-speech information (not necessarily just one tag).
Note that this method takes exactly one sentence. Its implementation may implement special cases for the first word of a sentence, which is usually written with an uppercase letter.

Specified by:

tag in interface Tagger

Parameters:

sentenceTokens - the text as returned by a WordTokenizer

Throws:

IOException

getAnalyzedTokens

protected List<AnalyzedToken> getAnalyzedTokens(String word)

asAnalyzedTokenList

protected List<AnalyzedToken> asAnalyzedTokenList(String word,
                                                  List<morfologik.stemming.WordData> wdList)

asAnalyzedTokenListForTaggedWords

protected List<AnalyzedToken> asAnalyzedTokenListForTaggedWords(String word,
                                                                List<TaggedWord> taggedWords)

asAnalyzedToken

protected AnalyzedToken asAnalyzedToken(String word,
                                        morfologik.stemming.WordData wd)

createNullToken
```
public final AnalyzedTokenReadings createNullToken(String token,
                                                   int startPos)
```
Description copied from interface: Tagger

Create the AnalyzedToken used for whitespace and other non-words. Use null as the POS tag for this token.

Specified by:

createNullToken in interface Tagger

createToken
```
public AnalyzedToken createToken(String token,
                                 String posTag)
```
Description copied from interface: Tagger

Create a token specific to the language of the implementing class.

Specified by:

createToken in interface Tagger

additionalTags

@Nullable
protected List<AnalyzedToken> additionalTags(String word,
                                                       WordTagger wordTagger)

Allows additional tagging in some language-dependent circumstances

Parameters:: word - The word to tag
Returns:: Returns list of analyzed tokens with additional tags, or null

Class BaseTagger

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

wordTagger

locale

Constructor Detail

BaseTagger

BaseTagger

BaseTagger

Method Detail

getManualAdditionsFileNames

getManualRemovalsFileNames

getDictionaryPath

overwriteWithManualTagger

getWordTagger

getDictionary

tag

getAnalyzedTokens

asAnalyzedTokenList

asAnalyzedTokenListForTaggedWords

asAnalyzedToken

createNullToken

createToken

additionalTags