|
|
The settings you can configure within a locale are:
See also:
Regular expressions are interpreted differently for different locales.
Locale definitions of the collating order of
a character set may differ, so that regular expressions containing
collating elements or ranges evaluate differently.
If letters are defined as being equivalent in collating order,
this might change the order of evaluation.
Character classes also vary between locales.
For example, the extended regular expression,
[A-z]
is intended to recognize all upper- or lowercase characters in English.
However, this fails to recognize accented characters in
the ISO8859-1 character set (with values from 0xC0 to 0xFF
in hexadecimal).
To recognize all upper- or lowercase characters, use:
[[:alpha:]]
This expression recognizes all characters in the set that match the set alpha defined within the current locale. In the POSIX locale, this includes the defined sets upper and lower. In other locales, it should include all the letters of the alphabet.
Because the interpretation of regular expressions is dependent on the locale, take care when using regular expressions in shell scripts that might be used in more than one locale. Also, when constructing a new locale definition ensure that the character classes you define correspond to the desired regular expressions.
See the regexp(M) manual page for rules on constructing regular expressions.
If you are using a locale definition that recognizes characters that are not in the standard US ASCII character set, you might have difficulty sending mail to a user on a system that is using a different locale (or one that does not recognize locales). Characters outside the core of alphanumeric characters common to ISO8859-1 might be ignored or mistranslated under other locales.
More problematically, if your user name or machine name contains an 8-bit character, a user on a 7-bit system cannot send any messages to you because they cannot input the 8-bit character in the address. Therefore, it is important not to create user names containing 8-bit characters.