public class MultiWordChunker2 extends AbstractDisambiguator
Constructor and Description |
---|
MultiWordChunker2(String filename) |
MultiWordChunker2(String filename,
boolean allowFirstCapitalized) |
Modifier and Type | Method and Description |
---|---|
AnalyzedSentence |
disambiguate(AnalyzedSentence input)
Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...)
|
protected String |
formatPosTag(String posTag,
int position,
int multiwordLength)
Override this method if you want format POS tag differently
|
protected boolean |
matches(String matchText,
AnalyzedTokenReadings inputTokens) |
protected AnalyzedTokenReadings |
prepareNewReading(String tokens,
String tok,
AnalyzedTokenReadings token,
String tag) |
void |
setRemoveOtherReadings(boolean removeOtherReadings) |
void |
setWrapTag(boolean wrapTag) |
preDisambiguate
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
disambiguate
public MultiWordChunker2(String filename)
filename
- file text with multiwords and tagspublic MultiWordChunker2(String filename, boolean allowFirstCapitalized)
filename
- file text with multiwords and tagsallowFirstCapitalized
- if set to true
, first word of the multiword can be capitalizedpublic void setRemoveOtherReadings(boolean removeOtherReadings)
removeOtherReadings
- If true and multiword matches other readings will be removedpublic void setWrapTag(boolean wrapTag)
wrapTag
- If true the tag will be wrapped with < and >protected String formatPosTag(String posTag, int position, int multiwordLength)
posTag
- POS tag for the multiwordposition
- Position of the token in the multiwordpublic AnalyzedSentence disambiguate(AnalyzedSentence input)
input
- The tokens to be chunked.protected boolean matches(String matchText, AnalyzedTokenReadings inputTokens)
protected AnalyzedTokenReadings prepareNewReading(String tokens, String tok, AnalyzedTokenReadings token, String tag)