public final class StringTools extends Object
Modifier and Type | Class and Description |
---|---|
static class |
StringTools.ApiPrintMode
Constants for printing XML rule matches.
|
Modifier and Type | Field and Description |
---|---|
static Set<String> |
LOWERCASE_GREEK_LETTERS |
static Set<String> |
UPPERCASE_GREEK_LETTERS |
Modifier and Type | Method and Description |
---|---|
static String |
addSpace(String word,
Language language)
Adds spaces before words that are not punctuation.
|
static String |
asString(CharSequence s) |
static void |
assureSet(String s,
String varName)
Throw exception if the given string is null or empty or only whitespace.
|
static String |
escapeForXmlAttribute(String s) |
static String |
escapeForXmlContent(String s) |
static String |
escapeHTML(String s)
Escapes these characters: less than, greater than, quote, ampersand.
|
static String |
escapeXML(String s)
Calls
escapeHTML(String) . |
static String |
filterXML(String str)
Simple XML filtering for XML tags.
|
static List<String> |
getDifference(String s1,
String s2)
Difference between two strings (only one difference)
|
static boolean |
hasDiacritics(String str) |
static boolean |
isAllUppercase(List<String> strList)
Returns true if the given list of string is made up of all-uppercase words.
|
static boolean |
isAllUppercase(String str)
Returns true if the given string is made up of all-uppercase characters
(ignoring characters for which no upper-/lowercase distinction exists).
|
static boolean |
isCamelCase(String token)
Whether the string is camelCase.
|
static boolean |
isCapitalizedWord(String str) |
static boolean |
isEmpty(String str)
Helper method to replace calls to
"".equals() . |
static boolean |
isMixedCase(String str)
Returns true if the given string is mixed case, like
MixedCase or mixedCase
(but not Mixedcase ). |
static boolean |
isNonBreakingWhitespace(String str)
Checks if a string is the non-breaking whitespace (
). |
static boolean |
isNotAllLowercase(String str)
Returns true if
str is not made up of all-lowercase characters
(ignoring characters for which no upper-/lowercase distinction exists). |
static boolean |
isNotWordCharacter(String input)
Whether the string is a punctuation mark
|
static boolean |
isNotWordString(String input) |
static boolean |
isParagraphEnd(String sentence,
boolean singleLineBreaksMarksPara) |
static boolean |
isPositiveNumber(char ch) |
static boolean |
isPunctuationMark(String input)
Whether the string is a punctuation mark
|
static boolean |
isWhitespace(String str)
Checks if a string contains a whitespace, including:
all Unicode whitespace
the non-breaking space (U+00A0)
the narrow non-breaking space (U+202F)
the zero width space (U+200B), used in Khmer
|
static List<String> |
loadLines(String path)
Deprecated.
use DataBroker#getFromResourceDirAsLines(java.lang.String) instead (NOTE: it won't handle comments)
|
static String |
lowercaseFirstChar(String str)
Return
str modified so that its first character is now an
lowercase character. |
static String |
lowercaseFirstCharIfCapitalized(String str)
Return
str if str is capitalized isCapitalizedWord(String) ,
otherwise return modified str so that its first character
is now a lowercase character. |
static String |
makeWrong(String s) |
static String |
normalizeNFC(String str) |
static String |
normalizeNFKC(String str) |
static String |
preserveCase(String inputString,
String modelString)
Apply to inputString the casing of modelString
|
static String |
readerToString(Reader reader) |
static String |
readStream(InputStream stream,
String encoding)
Read the text stream using the given encoding.
|
static String |
removeDiacritics(String str) |
static String |
removeTashkeel(String str)
Return
str without tashkeel characters |
static boolean |
startsWithLowercase(String str)
Whether the first character of
str is an uppercase character. |
static boolean |
startsWithUppercase(String str)
Whether the first character of
str is an uppercase character. |
static String |
streamToString(InputStream is,
String charsetName) |
static String |
toId(String input,
Language language)
Will turn a string into a typical rule ID, i.e. uppercase and
"_" instead of spaces.
|
static String |
trimSpecialCharacters(String s)
eliminate special (unicode) characters, e.g. soft hyphens
|
static String |
trimWhitespace(String s)
Filters any whitespace characters.
|
static String |
uppercaseFirstChar(String str)
Return
str modified so that its first character is now an
uppercase character. |
static String |
uppercaseFirstChar(String str,
Language language)
Like
uppercaseFirstChar(String) , but handles a special case for Dutch (IJ in
e.g. |
public static void assureSet(String s, String varName)
public static String readStream(InputStream stream, String encoding) throws IOException
stream
- InputStream the stream to be readencoding
- the stream's character encoding, e.g. utf-8
, or null
to use the system encoding\n
(note that \n
will
be added to the last line even if it is not in the stream)IOException
public static boolean isAllUppercase(String str)
public static boolean isAllUppercase(List<String> strList)
public static boolean isMixedCase(String str)
MixedCase
or mixedCase
(but not Mixedcase
).str
- input strpublic static boolean isNotAllLowercase(String str)
str
is not made up of all-lowercase characters
(ignoring characters for which no upper-/lowercase distinction exists).@Contract(value="null -> false") public static boolean isCapitalizedWord(@Nullable String str)
str
- input stringpublic static boolean startsWithUppercase(String str)
str
is an uppercase character.public static boolean startsWithLowercase(String str)
str
is an uppercase character.@Contract(value="!null -> !null") @Nullable public static String uppercaseFirstChar(@Nullable String str)
str
modified so that its first character is now an
uppercase character. If str
starts with non-alphabetic
characters, such as quotes or parentheses, the first character is
determined as the first alphabetic character.@Contract(value="!null, _ -> !null") @Nullable public static String uppercaseFirstChar(@Nullable String str, Language language)
uppercaseFirstChar(String)
, but handles a special case for Dutch (IJ in
e.g. "ijsselmeer" -> "IJsselmeer").language
- the language, will be ignored if it's null
@Contract(value="!null -> !null") @Nullable public static String lowercaseFirstChar(@Nullable String str)
str
modified so that its first character is now an
lowercase character. If str
starts with non-alphabetic
characters, such as quotes or parentheses, the first character is
determined as the first alphabetic character.@Contract(value="!null, -> !null") @Nullable public static String lowercaseFirstCharIfCapitalized(@Nullable String str)
str
if str is capitalized isCapitalizedWord(String)
,
otherwise return modified str
so that its first character
is now a lowercase character.public static String readerToString(Reader reader) throws IOException
IOException
public static String streamToString(InputStream is, String charsetName) throws IOException
IOException
public static String escapeXML(String s)
escapeHTML(String)
.public static String escapeHTML(String s)
public static String trimWhitespace(String s)
s
- String to be filtered.public static String trimSpecialCharacters(String s)
s
- String to filterpublic static String addSpace(String word, Language language)
word
- Word to add the preceding space.language
- Language of the word (to check typography conventions). Currently
French convention of not adding spaces only before '.' and ',' is
implemented; other languages assume that before ,.;:!? no spaces
should be added.public static boolean isWhitespace(String str)
str
- String to checkpublic static boolean isNonBreakingWhitespace(String str)
).public static boolean isPositiveNumber(char ch)
ch
- Character to checkpublic static boolean isEmpty(@Nullable String str)
"".equals()
.str
- String to checknull
public static String filterXML(String str)
str
- XML string to be filtered.public static boolean hasDiacritics(String str)
public static String preserveCase(String inputString, String modelString)
inputString,
- modelString@Nullable public static String asString(CharSequence s)
public static boolean isParagraphEnd(String sentence, boolean singleLineBreaksMarksPara)
public static List<String> loadLines(String path)
#
).path
- path in resource dirpublic static String toId(String input, Language language)
language
- LT language object, used to apply language-specific normalisation rules.public static boolean isCamelCase(String token)
public static boolean isPunctuationMark(String input)
public static boolean isNotWordCharacter(String input)
public static List<String> getDifference(String s1, String s2)
public static String removeTashkeel(String str)
str
without tashkeel charactersstr
- input strpublic static boolean isNotWordString(String input)