|
|
The type of expressions used in case statements and for matching filenames by the shells and by find(C), and by the commands used for accessing files (:e and :r) in vi, are known as shell regular expressions or case and file patterns. These differ from regular expressions; refer to ``Shell regular expressions'' for more information.
Traditionally, UNIX used a grammar that defined Simple Regular Expressions (SREs). Expressions that use this grammar are still supported. The ISO POSIX-2 DIS standard defines Basic Regular Expressions (BREs) and Extended Regular Expressions (EREs). Both of these regular expression types are internationalized versions of SREs:
EREs have a different and richer grammar than BREs. However, EREs are not a superset of BREs; some BREs will not work as EREs without modification.
The awk(C) and egrep(C) utilities use EREs; all other utilities that handle regular expressions, such as ed(C), ex(C), expr(C), grep(C), sed(C), and vi use BREs.
The following sections describe the elements that are used to
construct regular expressions. These sections are
marked to indicate which are BRE, or ERE
only.
A complication arises because the operators are themselves composed of characters; these are termed special characters. If you wish to search for one of the special characters as itself, see ``Searching for special characters''.
A BRE or ERE matches a string of zero or greater length where the characters in the string correspond to the pattern. The search begins at the start of a string and ends when the first matching sequence is found. If the expression allows there to be a variable number of characters in the matched string, the longest leftmost matching string is found.
Caret is treated as the literal caret character elsewhere in BREs, and always as an anchor in EREs; note that caret is also used to begin non-matching lists in bracket expressions (see ``Matching one from a set of characters'');
For example, the regular expression ``^début'' matches the first string ``début'' on a line that begins with ``débutant''; it would not match on the line that begins as ``au début''.
Dollar is treated as the literal dollar character elsewhere in BREs, and always as an anchor in EREs.
For example, the regular expression ``fin$'' matches ``fin'' on a line that ends with the string ``le fin''; it would not match on the line that ends in ``finale''.
For example, the regular expression ``^begin middle end$'' matches the complete line ``begin middle end''; it would not match the line ``begin middling end''.
An ERE group consists of a regular expression enclosed by the grouping operators ``('' and ``)''.
Subexpressions and groups match whatever the enclosed expression on its own would match. They are used to establish the expressions on which lower precedence operators should act (in the same way that parentheses are used in arithmetic). Any number of subexpressions or groups may be used, and they may be nested to any depth.
For example, the BRE ``\(dog\)matic'' matches the string ``dogmatic''. The equivalent ERE is ``(dog)matic''.
For example, the expression ``\(more\) and \1'' matches the string ``more and more''.
For example, the expression ``((auto)|(dog))matic'' matches the strings ``automatic'' and ``dogmatic''.
For example, the expression ``explicit'' can only match itself.
For example, the expression ``..plicit'' can match ``explicit'' or ``implicit''.
A matching list is used to specify a set of alternative characters that may be matched against a single character (or a sequence of several characters that are treated as a single character by the current collation sequence, see ``Matching multi-character collating elements'').
A non-matching list specifies a set of characters that may not be matched against a single character; that is, it will match any character except those specified. A non-matching list is indicated by a leading caret ``^''.
If a matching list includes a right bracket ``]'', it must be the first character in the list.
If a non-matching list includes a right bracket ``]'', it must be the first character following the initial ``^''.
As an example of a bracket expression, ``[Qq]werty'' matches ``Qwerty'' or ``qwerty''.
The bracket expression in the regular expression ``str[^ae]ng'' uses a non-matching list to eliminate some possible matches; ``string'', ``strong'', or ``strung'' matches, but ``strang'' and ``streng'' do not.
'' matches zero or greater
occurrences of the previous single character (including bracket
expressions), BRE subexpression,
ERE grouping, or BRE back-reference.
An asterisk is treated as itself if it occurs inside a bracket expression, as the first character of the regular expression (after any initial ``^'' anchor), as the first character of a BRE subexpression (after any initial anchor), as the first character of an ERE group, or if it is preceded by a single backslash ``\''.
For example, the expression ``a
'' matches ``aaaa''
in the string ``aaaaba'', and it matches the null string
in ``bbbbcb''.
The pattern ``[ab]
'' matches ``aaab''
in ``daaabcbbb''.
Note that ``.
'' matches any string of characters, so
``sub.
'' would match ``subway'',
``submarine'', ``subjunctive'', and
``subsidy''.
The BRE ``\(a.
\)
cad\1'' matches
``abracadabra''; it also matches ``cad'' in the
string ``academic'' since the first subexpression may be
matched by a null string.
A plus sign is treated as itself if it occurs inside a bracket expression, or if it is preceded by a single backslash ``\''.
For example, the expression ``a+'' matches ``aaaa'' in the string ``aaaaba'', but it does not match ``bbbbcb''.
The pattern ``e+p'' matches ``eep'' in ``sleep'', and ``ep'' in ``step''.
A question mark is treated as itself if it occurs inside a bracket expression, or if it is preceded by a single backslash ``\''.
For example, ``l?i'' matches ``li'' in ``slip'', and ``i'' in ``sip''.
An ERE interval expression has the syntax ``{l}'', ``{l,}'', or ``{l,u}''.
An interval expression matches at least l, and at most u occurrences of the previous single character, subexpression, or back-reference. The lower limit, l, may not be less than 0. If l is specified without a trailing comma, exactly l occurrences are matched. The upper limit, u, must be less than or equal to 255 if it is specified. If u is omitted, but the comma is not, the upper limit on the number of occurrences is effectively infinite.
For example, the BRE expression ``e\{2\}'' matches ``ee'' in ``eleemosynary'', but finds no match in ``elementary''. The equivalent ERE expression is ``e{2}''.
The ERE expression ``(is{2}){2}'' matches ``ississ'' in ``Mississippi''.
Single-character elements are treated as themselves if specified as collating symbols; this is useful for representing characters such as dash ``-'' that have special meaning inside bracket expressions (see ``Specifying ranges of characters'').
For example, the expression ``[[.ij.]]'' matches only the
collating element ``ij'' corresponding to the collating
symbol <ij> in the current collation sequence; it is not the same
as the bracket expression ``[ij]'' that matches ``i'' or
``j''.
Note that only primary equivalence classes are recognized.
For example, if the characters ``e'', ``è'',
``é'', ``ê'', and ``ë''
are equivalent, then
the bracket expression ``[d[=e=]f]'' is the same as
``[deèéêëf]''.
The following character classes are supported:
For example, in the POSIX locale, the bracket expression
``[^[:alnum:]]'' matches on a non-alphanumeric character.
The starting point must occur before the ending point in the collation sequence.
Only collating elements or collating symbols may be used for starting and ending points. For example, ``[[=a=]-z]'' is invalid, but ``[[.sz.]-z]'' is valid if <sz> occurs earlier than ``z'' in the current collation sequence.
The ending point of one range cannot be used as the starting point of another range. For instance, in the POSIX locale, ``[a-mn-z]'' is allowed, but ``[a-m-z]'' is interpreted as ``[a-m[.-.]z]''.
A range must specify both a starting and an ending point; otherwise, the dash ``-'' character is treated as itself. ``[-dot]'' and ``[dot-]'' match any character from ``d'', ``o'', ``t'', and ``-''; ``[^-dot]'' and ``[^dot-]'' match any character but these.
To specify a dash character as the start of a range, specify it
as the first character in a matching list, after the initial caret
``^'' in a non-matching list, or enclose it in collation
symbol operators:
[.-.]
For example, in the POSIX locale, the expression ``[^[:digit:][.-.]-/]'' matches any character but the numeric digits and the symbols ``-'', ``.'', and ``/''. The expression ``[[.-.]-a]'' or ``[--a]'' matches characters in the range ``-'' to ``a''; ``[!--]'' or ``[!-[.-.]]'' matches characters from ``!'' to ``-''.
Note that using range expressions within applications may make them non-portable; the collation sequence may differ for locales in a way that will influence the order of execution or cause errors.
For example, in the POSIX locale, the expression ``[0-9]'' is identical to ``[0123456789]'' (and to ``[[:digit:]]'').
^ $
These characters are interpreted as themselves if they are used outside the context in which they are operators. You can force any of them to be interpreted as the character itself by preceding it with a backslash ``\''.
Note that the characters ( ) { } and the digits ``1'' through ``9'' have special meaning in BREs if they are preceded by a backslash.
The ERE special characters are:
. [ \ ( )
+ ? { | ^ $
These characters are interpreted as themselves if they are used outside the context in which they are operators. You can force any of them to be interpreted as the character itself by preceding it with a backslash ``\''.
| Expression type | BRE operators |
|---|---|
| equivalence class, character class, collation symbol | [==] [::] [..] |
| escaped special characters | \character |
| bracket expressions | [] |
| subexpressions, back-references | \(\) \# |
| zero or more occurrences, interval expression |
\{l,u\}
|
| expression concatenation | |
| start, end anchoring | ^ $ |
| Expression type | ERE operators |
|---|---|
| equivalence class, character class, collation symbol | [==] [::] [..] |
| escaped special characters | \character |
| bracket expressions | [] |
| grouping | () |
| zero or more, one or more, zero or one occurrences, interval expression |
+ ? {l,u}
|
| expression concatenation | |
| start, end anchoring | ^ $ |
| alternation | | |
The following case and file pattern operators are all available in ksh, and the vi editors. csh, find, and sh allow the use of a more restricted set. (The utilities that you can use the patterns with are shown in parentheses; vi represents the vi family of editors.)
? ( | & ) [
Only the special character ``]'' must be escaped inside a bracket expression. (Available in csh, find, ksh, sh, and vi.)
lists all files that begin
with a digit from ``0'' to ``9''. If a range is given
in an order that does not correspond to the current collation
sequence, the entire bracket expression is treated as a literal
string to be searched for.
(Available in csh, find,
ksh, sh, and vi.)
lists all files that do
not begin with an upper or lower case alphabetic character.
Note that the shell suppresses the listing of hidden files
(that have filenames starting with ``.'')
even though they are non-matching.
(Available in csh, find,
ksh, sh, and vi.)

'';
it stands for a string of any characters.
The shells prevent matching with filenames
that start with a dot ``.''
(to conceal hidden files).
For example, ls
.c
lists all files in the current directory
that have the extension ``.c'',
but it would not list a file named .foo.c.
(Available in csh, find,
ksh, sh, and vi.)
(pattern [|pattern ...)
(xyz) matches
zero or more occurrences of the string ``xyz''.
(Available in ksh, and vi.)
Case and file patterns do not recognize:
No more than the first 9 sub-expressions may be back-referenced within a regular expression.
Interval expressions may not specify an upper limit greater than 255; if not specified, the limit is effectively infinite.
Note that a collation sequence is not necessarily equivalent to a collation order. See localedef(F) for more details.
``Regular expressions'' in the Operating System User's Guide
X/Open CAE Specification, Commands and Utilities, Issue 4, 1992.