| 
 | 
awk provides regular expressions for pattern matching; the syntax of UNIX system expressions is described in ``Regular expressions''.
The simplest regular expression is a string of characters matching only itself: that is, the string is a literal. In awk, a regular expression is typically enclosed within slashes in order to label it as a regular expression as opposed to an awk command, as follows:
/Asia/This program points to all input records that contain the substring ``Asia''; if a record contains ``Asia'' as part of a larger string like ``Asian'' or ``Pan-Asiatic'', it is also printed.
awk provides the full range of UNIX system regular expression metacharacters; see ``Regular expressions'' for a detailed explanation. (In addition, awk recognizes the escape sequences listed in ``The echo command''.) awk also provides the regular expression operators shown in ``awk regular expression operators''.
awk regular expression operators
| Operator | Meaning | 
|---|---|
| ~ | matches | 
| !~ | does not match | 
   $4 ~ /Asia/ { print $1 }
This program prints the first field of all lines in which the
fourth field does not match ``Asia'':
   $4 !~ /Asia/ { print $1 }
awk interprets any string or variable on the right side of
a ~ or !~ as a regular expression. For example:
$2 !~ /^[0-9]+$/This sample program can be rewritten as follows:
   BEGIN     { digits = "^[0-9]+$" }
   $2 !~ digits
Suppose you wanted to search for a string of characters such as
^[0-9]+$. When a literal quoted string like
"^[0-9]+$" is used as a regular expression, one
extra level of backslashes is needed to protect regular expression
metacharacters. This is because one level of backslashes is removed
when a string is originally parsed. If a backslash is needed in
front of a character to turn off its special meaning in a regular
expression, then that backslash needs a preceding backslash to
protect it in a string.
For example, suppose we want to match strings containing ``b'' followed by a dollar sign. The regular expression for this pattern is b\$. To create a string to represent this regular expression, add one more backslash, as follows:
"b\\$"The two regular expressions on each of the following lines are equivalent:
x ~ "b\\$" x ~ /b\$/ x ~ "b\$" x ~ /b$/ x ~ "b$" x ~ /b$/ x ~ "\\t" x ~ /\t/A summary of the regular expressions and the substrings they match is given in ``awk regular expressions''. The unary operators
, +, and ?
have the highest precedence, with concatenation next, and then
alternation (|). All operators are left-associative. The
r stands for any regular expression.
awk regular expressions
| Expression | Matches | 
|---|---|
| char | any non-metacharacter char | 
| \char | character char literally | 
| ^ | beginning of string | 
| $ | end of string | 
| . | any character but newline | 
| [s] | any character in set s | 
| [^s] | any character not in set s | 
r 
 | zero or more rs | 
| r+ | one or more rs | 
| r? | zero or one r | 
| (r) | r | 
| r1 r2 | r1 then r2 (concatenation) | 
| r1|r2 | r1 or r2 (alternation) |