DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH
 

(gawk.info.gz) Egrep Program

Info Catalog (gawk.info.gz) Cut Program (gawk.info.gz) Clones (gawk.info.gz) Id Program
 
 Searching for Regular Expressions in Files
 ------------------------------------------
 
    The `egrep' utility searches files for patterns.  It uses regular
 expressions that are almost identical to those available in `awk'
 ( Regular Expressions Regexp.).  It is used in the following
 manner:
 
      egrep [ OPTIONS ] 'PATTERN' FILES ...
 
    The PATTERN is a regular expression.  In typical usage, the regular
 expression is quoted to prevent the shell from expanding any of the
 special characters as file name wildcards.  Normally, `egrep' prints
 the lines that matched.  If multiple file names are provided on the
 command line, each output line is preceded by the name of the file and
 a colon.
 
    The options to `egrep' are as follows:
 
 `-c'
      Print out a count of the lines that matched the pattern, instead
      of the lines themselves.
 
 `-s'
      Be silent.  No output is produced and the exit value indicates
      whether the pattern was matched.
 
 `-v'
      Invert the sense of the test. `egrep' prints the lines that do
      _not_ match the pattern and exits successfully if the pattern is
      not matched.
 
 `-i'
      Ignore case distinctions in both the pattern and the input data.
 
 `-l'
      Only print (list) the names of the files that matched, not the
      lines that matched.
 
 `-e PATTERN'
      Use PATTERN as the regexp to match.  The purpose of the `-e'
      option is to allow patterns that start with a `-'.
 
    This version uses the `getopt' library function ( Processing
 Command-Line Options Getopt Function.)  and the file transition
 library program ( Noting Data File Boundaries Filetrans
 Function.).
 
    The program begins with a descriptive comment and then a `BEGIN' rule
 that processes the command-line arguments with `getopt'.  The `-i'
 (ignore case) option is particularly easy with `gawk'; we just use the
 `IGNORECASE' built-in variable ( Built-in Variables):
 
      # egrep.awk --- simulate egrep in awk
      # Options:
      #    -c    count of lines
      #    -s    silent - use exit value
      #    -v    invert test, success if no match
      #    -i    ignore case
      #    -l    print filenames only
      #    -e    argument is pattern
      #
      # Requires getopt and file transition library functions
      
      BEGIN {
          while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) {
              if (c == "c")
                  count_only++
              else if (c == "s")
                  no_print++
              else if (c == "v")
                  invert++
              else if (c == "i")
                  IGNORECASE = 1
              else if (c == "l")
                  filenames_only++
              else if (c == "e")
                  pattern = Optarg
              else
                  usage()
          }
 
    Next comes the code that handles the `egrep'-specific behavior. If no
 pattern is supplied with `-e', the first nonoption on the command line
 is used.  The `awk' command-line arguments up to `ARGV[Optind]' are
 cleared, so that `awk' won't try to process them as files.  If no files
 are specified, the standard input is used, and if multiple files are
 specified, we make sure to note this so that the file names can precede
 the matched lines in the output:
 
          if (pattern == "")
              pattern = ARGV[Optind++]
      
          for (i = 1; i < Optind; i++)
              ARGV[i] = ""
          if (Optind >= ARGC) {
              ARGV[1] = "-"
              ARGC = 2
          } else if (ARGC - Optind > 1)
              do_filenames++
      
      #    if (IGNORECASE)
      #        pattern = tolower(pattern)
      }
 
    The last two lines are commented out, since they are not needed in
 `gawk'.  They should be uncommented if you have to use another version
 of `awk'.
 
    The next set of lines should be uncommented if you are not using
 `gawk'.  This rule translates all the characters in the input line into
 lowercase if the `-i' option is specified.(1) The rule is commented out
 since it is not necessary with `gawk':
 
      #{
      #    if (IGNORECASE)
      #        $0 = tolower($0)
      #}
 
    The `beginfile' function is called by the rule in `ftrans.awk' when
 each new file is processed.  In this case, it is very simple; all it
 does is initialize a variable `fcount' to zero. `fcount' tracks how
 many lines in the current file matched the pattern (naming the
 parameter `junk' shows we know that `beginfile' is called with a
 parameter, but that we're not interested in its value):
 
      function beginfile(junk)
      {
          fcount = 0
      }
 
    The `endfile' function is called after each file has been processed.
 It affects the output only when the user wants a count of the number of
 lines that matched.  `no_print' is true only if the exit status is
 desired.  `count_only' is true if line counts are desired.  `egrep'
 therefore only prints line counts if printing and counting are enabled.
 The output format must be adjusted depending upon the number of files to
 process.  Finally, `fcount' is added to `total', so that we know the
 total number of lines that matched the pattern:
 
      function endfile(file)
      {
          if (! no_print && count_only)
              if (do_filenames)
                  print file ":" fcount
              else
                  print fcount
      
          total += fcount
      }
 
    The following rule does most of the work of matching lines. The
 variable `matches' is true if the line matched the pattern. If the user
 wants lines that did not match, the sense of `matches' is inverted
 using the `!' operator. `fcount' is incremented with the value of
 `matches', which is either one or zero, depending upon a successful or
 unsuccessful match.  If the line does not match, the `next' statement
 just moves on to the next record.
 
    A number of additional tests are made, but they are only done if we
 are not counting lines.  First, if the user only wants exit status
 (`no_print' is true), then it is enough to know that _one_ line in this
 file matched, and we can skip on to the next file with `nextfile'.
 Similarly, if we are only printing file names, we can print the file
 name, and then skip to the next file with `nextfile'.  Finally, each
 line is printed, with a leading file name and colon if necessary:
 
      {
          matches = ($0 ~ pattern)
          if (invert)
              matches = ! matches
      
          fcount += matches    # 1 or 0
      
          if (! matches)
              next
      
          if (! count_only) {
              if (no_print)
                  nextfile
      
              if (filenames_only) {
                  print FILENAME
                  nextfile
              }
      
              if (do_filenames)
                  print FILENAME ":" $0
              else
                  print
          }
      }
 
    The `END' rule takes care of producing the correct exit status. If
 there are no matches, the exit status is one; otherwise it is zero:
 
      END    \
      {
          if (total == 0)
              exit 1
          exit 0
      }
 
    The `usage' function prints a usage message in case of invalid
 options, and then exits:
 
      function usage(    e)
      {
          e = "Usage: egrep [-csvil] [-e pat] [files ...]"
          e = e "\n\tegrep [-csvil] pat [files ...]"
          print e > "/dev/stderr"
          exit 1
      }
 
    The variable `e' is used so that the function fits nicely on the
 printed page.
 
    Just a note on programming style: you may have noticed that the `END'
 rule uses backslash continuation, with the open brace on a line by
 itself.  This is so that it more closely resembles the way functions
 are written.  Many of the examples in this major node use this style.
 You can decide for yourself if you like writing your `BEGIN' and `END'
 rules this way or not.
 
    ---------- Footnotes ----------
 
    (1) It also introduces a subtle bug; if a match happens, we output
 the translated line, not the original.
 
Info Catalog (gawk.info.gz) Cut Program (gawk.info.gz) Clones (gawk.info.gz) Id Program
automatically generated byinfo2html