DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH
 

(gawk.info.gz) Field Separators

Info Catalog (gawk.info.gz) Changing Fields (gawk.info.gz) Reading Files (gawk.info.gz) Constant Size
 
 Specifying How Fields Are Separated
 ===================================
 

Menu

 
* Regexp Field Splitting       Using regexps as the field separator.
* Single Character Fields      Making each character a separate field.
* Command Line Field Separator Setting `FS' from the command-line.
* Field Splitting Summary      Some final points and a summary table.
 
    The "field separator", which is either a single character or a
 regular expression, controls the way `awk' splits an input record into
 fields.  `awk' scans the input record for character sequences that
 match the separator; the fields themselves are the text between the
 matches.
 
    In the examples that follow, we use the bullet symbol (*) to
 represent spaces in the output.  If the field separator is `oo', then
 the following line:
 
      moo goo gai pan
 
 is split into three fields: `m', `*g', and `*gai*pan'.  Note the
 leading spaces in the values of the second and third fields.
 
    The field separator is represented by the built-in variable `FS'.
 Shell programmers take note:  `awk' does _not_ use the name `IFS' that
 is used by the POSIX-compliant shells (such as the Unix Bourne shell,
 `sh', or `bash').
 
    The value of `FS' can be changed in the `awk' program with the
 assignment operator, `=' ( Assignment Expressions Assignment
 Ops.).  Often the right time to do this is at the beginning of execution
 before any input has been processed, so that the very first record is
 read with the proper separator.  To do this, use the special `BEGIN'
 pattern ( The `BEGIN' and `END' Special Patterns BEGIN/END.).
 For example, here we set the value of `FS' to the string `","':
 
      awk 'BEGIN { FS = "," } ; { print $2 }'
 
 Given the input line:
 
      John Q. Smith, 29 Oak St., Walamazoo, MI 42139
 
 this `awk' program extracts and prints the string `*29*Oak*St.'.
 
    Sometimes the input data contains separator characters that don't
 separate fields the way you thought they would.  For instance, the
 person's name in the example we just used might have a title or suffix
 attached, such as:
 
      John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
 
 The same program would extract `*LXIX', instead of `*29*Oak*St.'.  If
 you were expecting the program to print the address, you would be
 surprised.  The moral is to choose your data layout and separator
 characters carefully to prevent such problems.  (If the data is not in
 a form that is easy to process, perhaps you can massage it first with a
 separate `awk' program.)
 
    Fields are normally separated by whitespace sequences (spaces, tabs,
 and newlines), not by single spaces.  Two spaces in a row do not
 delimit an empty field.  The default value of the field separator `FS'
 is a string containing a single space, `" "'.  If `awk' interpreted
 this value in the usual way, each space character would separate
 fields, so two spaces in a row would make an empty field between them.
 The reason this does not happen is that a single space as the value of
 `FS' is a special case--it is taken to specify the default manner of
 delimiting fields.
 
    If `FS' is any other single character, such as `","', then each
 occurrence of that character separates two fields.  Two consecutive
 occurrences delimit an empty field.  If the character occurs at the
 beginning or the end of the line, that too delimits an empty field.  The
 space character is the only single character that does not follow these
 rules.
 
Info Catalog (gawk.info.gz) Changing Fields (gawk.info.gz) Reading Files (gawk.info.gz) Constant Size
automatically generated byinfo2html