Manipulating text with sed

Context addresses

Context addresses are regular expressions enclosed in slashes ``/''. If you specify a context address for a command, sed only applies the editing function to those lines which match the regular expression. By using context addresses and a print function, you can improvise a grep-like behavior; for example, the shell script mygrep:

sed -n -e "/$1/p" <$2

This script uses the shell parameter $1 as a context address for sed to use in searching the file specified by the parameter $2. Whenever sed finds a line matching the address given in $1, it executes the p function (print) and outputs that line.

Note the -n argument to sed in this script; sed normally echoes every line it reads to its standard output. While the -n option is in effect, sed only prints when you tell it to with the p or P functions. Note also that if you want to use sed within a shell script and pass parameters to it, the sed instructions must be in double quotes, not single quotes (see ``How the shell works'' for an explanation of shell quoting and its meaning).

For example:

   mygrep charlie /etc/passwd
   charlie::8:5:Charles Stross:/usr/charlie:/usr/bin/ksh

Context addresses are enclosed in slashes (/). They include all the regular expressions common to both ed and sed:

An ordinary character is a regular expression and matches itself.
A caret (^) at the beginning of a regular expression matches the null character at the beginning of a line.
A dollar sign ($) at the end of a regular expression matches the null character at the end of a line.
The characters (\n) match an embedded newline character, but not the newline at the end of a pattern space.
A period (.) matches any character except the terminal newline of the pattern space.
A regular expression followed by a star () matches any number, including 0, of adjacent strings matching the regular expression.
A string of characters in square brackets ([ ]) matches any character in the string, and no others. If, however, the first character of the string is a caret (^), the regular expression matches any character except the characters in the string and the terminal newline of the pattern space.
A concatenation of regular expressions is one that matches a particular concatenation of strings.
A regular expression between the sequences ``$'' and ``$'' is grouped, and can be referred to as a unit by the s function. (Note the following specification.)
The expression (\d) means the same string of characters matched by an expression enclosed in ``$'' and ``$'' earlier in the same pattern. Here d is a single digit; the string specified is that beginning with the dth occurrence of ``$'', counting from the left. For example, the expression ^\(.*$\1 matches a line beginning with two repeated occurrences of the same string.
The null regular expression standing alone is equivalent to the last regular expression compiled.

For a context address to ``match'' the input, the whole pattern within the address must match some portion of the pattern space. If you want to use one of the special characters literally, that is, to match an occurrence of itself in the input file, precede the character with a backslash (\) in the command.

Each sed command can have 0, 1, or 2 addresses.

A command with no addresses specified is applied to every line in the input. For example:

s/red/green/
This command substitutes the first instance of ``green'' for ``red'' on all lines.
A command with one address is applied to all lines that match that address. For example:

/mike/s/fred/john/
substitutes the first instance of ``john'' for ``fred'' only on those lines containing ``mike''.
A command with two addresses is applied to the first line that matches the first address, then to all subsequent lines until a match for the second address has been processed. An attempt is made to match the first address on subsequent lines, and the process is repeated.
Two addresses are separated by a comma. For example:

50,100s/fred/john/
Substitutes the first instance of ``john'' for ``fred'' from line 50 to line 100 inclusive. (Note that there should be no space between the second address and the s command.)
If an address is followed by an exclamation mark (!), the command is applied only to lines that do not match the address. For example:

50,100!s/fred/john/
substitutes the first instance of ``john'' for ``fred'' everywhere except lines 50 to 100 inclusive.

Here are some examples based on the following configuration file (a piece of an /etc/passwd file):

   root:x:0:0:Superuser:/:
   remacc:x:::Remote access::
   daemon:No login:1:1:Spooler:/usr/spool:
   sys:No login:2:2:System information::
   bin:x:3:3:System administrator:/usr/src:
   xmail:x:4:4:Secret Mail:/usr/spool/pubkey:
   msgs:No login:7:7:System messages:/usr/msgs:
   charlie:x:8:5:Charles Stross:/usr/charlie:/bin/ksh

/oo/: matches lines 1, 3, 6 in our sample file
/o*o/: matches lines 1, 3, 4, 6, 7
/[Cc]h./: matches line 8
/^o/: matches no lines
/./: matches all lines
/o$/: matches no lines

You can use a single address to control the application of a group of commands by grouping the commands with curly braces ({ }). For example:

/red/ {
s/red/green/
s/blue/yellow/
}

This short sed script searches for lines containing the regular expression ``red'' and then carries out the grouped commands, which replace the first occurrence of ``red'' with ``green'' and the first instance of ``blue'' with ``yellow'' on each matching line. You might use this script by placing it in a file called subst.red and invoking it from a shell as follows:

$ sed -f subst.red <input_file >output_file

For more on substitution in sed, including how to apply a change to all instances of a matching string, see ``Substitute functions''.