Specifying the locale

About locales

The settings you can configure within a locale are:

collating sequence: The order in which a local character set is sorted. This is used by the sort(C) command and by programs that use regular expressions. See ``Regular expressions and locales''.
currency format: The character used to denote a unit of currency and the format used for printing monetary values.
character classification table: The table used to determine whether a given character is an upper- or lowercase letter, a number, space, or some other class of symbol.
time/date format: The format in which the time and date are presented.
number format: The format in which numbers are printed (whether groups of digits are separated by a delimiter, and the type of delimiter to use for decimals).
response strings: The standard strings to print in place of the English words ``yes'' and ``no''.

Because the locale in use governs the interpretation of data rather than its representation, the same data might appear differently when presented under a different locale. In particular, electronic mail might be affected when it is sent from one locale to another; see ``How mail translates between locales''.

Regular expressions and locales

Regular expressions are interpreted differently for different locales. Locale definitions of the collating order of a character set may differ, so that regular expressions containing collating elements or ranges evaluate differently. If letters are defined as being equivalent in collating order, this might change the order of evaluation. Character classes also vary between locales. For example, the extended regular expression,

[A-z]

is intended to recognize all upper- or lowercase characters in English. However, this fails to recognize accented characters in the ISO8859-1 character set (with values from 0xC0 to 0xFF in hexadecimal).
To recognize all upper- or lowercase characters, use:

[[:alpha:]]

This expression recognizes all characters in the set that match the set alpha defined within the current locale. In the POSIX locale, this includes the defined sets upper and lower. In other locales, it should include all the letters of the alphabet.

Because the interpretation of regular expressions is dependent on the locale, take care when using regular expressions in shell scripts that might be used in more than one locale. Also, when constructing a new locale definition ensure that the character classes you define correspond to the desired regular expressions.

See the regexp(M) manual page for rules on constructing regular expressions.

How mail translates between locales

If you are using a locale definition that recognizes characters that are not in the standard US ASCII character set, you might have difficulty sending mail to a user on a system that is using a different locale (or one that does not recognize locales). Characters outside the core of alphanumeric characters common to ISO8859-1 might be ignored or mistranslated under other locales.

More problematically, if your user name or machine name contains an 8-bit character, a user on a 7-bit system cannot send any messages to you because they cannot input the 8-bit character in the address. Therefore, it is important not to create user names containing 8-bit characters.