Regular expressions

Regular expression grouping

Terms in an editor regular expression can be grouped together using \( and \). Any regular expression so constructed is treated as an identifiable unit in a larger regular expression, and can be referred to later in a search/replace expression by the editor.

This is a particularly useful mechanism. Each regular expression enclosed between escaped brackets is treated as a positional parameter. For example, in the regular expression \([Tt]he\).\(fox\) the first grouped expression matches the words ``The'' or ``the''. It is followed by an indeterminate string of any characters, then a second grouped expression matching only the word ``fox''.

The first grouped expression may be referred to in the editor expression as \1, the second expression as \2, and so on. For an illustration of how this can be used to swap the order of regular expressions during a search and replace operation, see ``Manipulating text with sed''.

Grouping can be used to search for words separated by white space (tabs or spaces). For example, suppose you want to search for the expression above, where the words are separated by white space. You could construct a pattern like this:

\([Tt]he\)\([<Tab><Enter><Space>]\{1,100\}\)\(fox\)

The middle group, \([<Tab><Enter><Space>]\{1,100\}\), is a group consisting of the set of space, tab and newline characters, matched from one to one hundred times. Thus, it will match from one to one hundred white space characters as a group separating ``The'' and ``fox''.

When a program that uses regular expressions tries to find a match, it searches for a string that matches the first group. If it finds a match, it then tries to match up the second group, then the third, and so on. A complete match is only confirmed when all the expressions in a group are correctly matched to a string of consecutive characters in the target file.