DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH
 

(autoconf.info.gz) Limitations of Usual Tools

Info Catalog (autoconf.info.gz) Limitations of Builtins (autoconf.info.gz) Portable Shell (autoconf.info.gz) Limitations of Make
 
 Limitations of Usual Tools
 ==========================
 
    The small set of tools you can expect to find on any machine can
 still include some limitations you should be aware of.
 
 `awk'
      Don't leave white spaces before the parentheses in user functions
      calls; GNU awk will reject it:
 
           $ gawk 'function die () { print "Aaaaarg!"  }
                   BEGIN { die () }'
           gawk: cmd. line:2:         BEGIN { die () }
           gawk: cmd. line:2:                      ^ parse error
           $ gawk 'function die () { print "Aaaaarg!"  }
                   BEGIN { die() }'
           Aaaaarg!
 
      If you want your program to be deterministic, don't depend on `for'
      on arrays:
 
           $ cat for.awk
           END {
             arr["foo"] = 1
             arr["bar"] = 1
             for (i in arr)
               print i
           }
           $ gawk -f for.awk </dev/null
           foo
           bar
           $ nawk -f for.awk </dev/null
           bar
           foo
 
      Some AWK, such as HPUX 11.0's native one, have regex engines
      fragile to inner anchors:
 
           $ echo xfoo | $AWK '/foo|^bar/ { print }'
           $ echo bar | $AWK '/foo|^bar/ { print }'
           bar
           $ echo xfoo | $AWK '/^bar|foo/ { print }'
           xfoo
           $ echo bar | $AWK '/^bar|foo/ { print }'
           bar
 
      Either do not depend on such patterns (i.e., use `/^(.*foo|bar)/',
      or use a simple test to reject such AWK.
 
 `cat'
      Don't rely on any option.  The option `-v', which displays
      non-printing characters, _seems_ portable, though.
 
 `cc'
      When a compilation such as `cc foo.c -o foo' fails, some compilers
      (such as CDS on Reliant UNIX) leave a `foo.o'.
 
      HP-UX `cc' doesn't accept `.S' files to preprocess and assemble.
      `cc -c foo.S' will appear to succeed, but in fact does nothing.
 
      The default executable, produced by `cc foo.c', can be
 
         * `a.out' -- usual Unix convention.
 
         * `a.exe' -- DJGPP port of `gcc'.
 
         * `a_out.exe' -- GNV `cc' wrapper for DEC C on OpenVMS.
 
         * `foo.exe' -- various MS-DOS compilers.
 
 `cmp'
      `cmp' performs a raw data comparison of two files, while `diff'
      compares two text files.  Therefore, if you might compare DOS
      files, even if only checking whether two files are different, use
      `diff' to avoid spurious differences due to differences of newline
      encoding.
 
 `cp'
      SunOS `cp' does not support `-f', although its `mv' does.  It's
      possible to deduce why `mv' and `cp' are different with respect to
      `-f'.  `mv' prompts by default before overwriting a read-only
      file.  `cp' does not.  Therefore, `mv' requires a `-f' option, but
      `cp' does not.  `mv' and `cp' behave differently with respect to
      read-only files because the simplest form of `cp' cannot overwrite
      a read-only file, but the simplest form of `mv' can.  This is
      because `cp' opens the target for write access, whereas `mv'
      simply calls `link' (or, in newer systems, `rename').
 
      Bob Proulx notes that `cp -p' always _tries_ to copy ownerships.
      But whether it actually does copy ownerships or not is a system
      dependent policy decision implemented by the kernel.  If the
      kernel allows it then it happens.  If the kernel does not allow it
      then it does not happen.  It is not something `cp' itself has
      control over.
 
      In SysV any user can chown files to any other user, and SysV also
      had a non-sticky `/tmp'.  That undoubtedly derives from the
      heritage of SysV in a business environment without hostile users.
      BSD changed this to be a more secure model where only root can
      `chown' files and a sticky `/tmp' is used.  That undoubtedly
      derives from the heritage of BSD in a campus environment.
 
      Linux by default follows BSD, but it can be configured to allow
      `chown'.  HP-UX as an alternate example follows SysV, but it can
      be configured to use the modern security model and disallow
      `chown'.  Since it is an administrator configurable parameter you
      can't use the name of the kernel as an indicator of the behavior.
 
 `date'
      Some versions of `date' do not recognize special % directives, and
      unfortunately, instead of complaining, they just pass them through,
      and exit with success:
 
           $ uname -a
           OSF1 medusa.sis.pasteur.fr V5.1 732 alpha
           $ date "+%s"
           %s
 
 `diff'
      Option `-u' is nonportable.
 
      Some implementations, such as Tru64's, fail when comparing to
      `/dev/null'.  Use an empty file instead.
 
 `dirname'
      Not all hosts have a working `dirname', and you should instead use
      `AS_DIRNAME' ( Programming in M4sh).  For example:
 
           dir=`dirname "$file"`       # This is not portable.
           dir=`AS_DIRNAME(["$file"])` # This is more portable.
 
      This handles a few subtleties in the standard way required by
      POSIX.  For example, under UN*X, should `dirname //1' give `/'?
      Paul Eggert answers:
 
           No, under some older flavors of Unix, leading `//' is a
           special path name: it refers to a "super-root" and is used to
           access other machines' files.  Leading `///', `////', etc.
           are equivalent to `/'; but leading `//' is special.  I think
           this tradition started with Apollo Domain/OS, an OS that is
           still in use on some older hosts.
 
           POSIX allows but does not require the special treatment for
           `//'.  It says that the behavior of dirname on path names of
           the form `//([^/]+/*)?'  is implementation defined.  In these
           cases, GNU `dirname' returns `/', but it's more portable to
           return `//' as this works even on those older flavors of Unix.
 
 `egrep'
      POSIX 1003.1-2001 no longer requires `egrep', but many older hosts
      do not yet support the POSIX replacement `grep -E'.  To work
      around this problem, invoke `AC_PROG_EGREP' and then use `$EGREP'.
 
      The empty alternative is not portable, use `?' instead.  For
      instance with Digital Unix v5.0:
 
           > printf "foo\n|foo\n" | $EGREP '^(|foo|bar)$'
           |foo
           > printf "bar\nbar|\n" | $EGREP '^(foo|bar|)$'
           bar|
           > printf "foo\nfoo|\n|bar\nbar\n" | $EGREP '^(foo||bar)$'
           foo
           |bar
 
      `$EGREP' also suffers the limitations of `grep'.
 
 `expr'
      No `expr' keyword starts with `x', so use `expr x"WORD" :
      'xREGEX'' to keep `expr' from misinterpreting WORD.
 
      Don't use `length', `substr', `match' and `index'.
 
 `expr' (`|')
      You can use `|'.  Although POSIX does require that `expr '''
      return the empty string, it does not specify the result when you
      `|' together the empty string (or zero) with the empty string.  For
      example:
 
           expr '' \| ''
 
      GNU/Linux and POSIX.2-1992 return the empty string for this case,
      but traditional Unix returns `0' (Solaris is one such example).
      In the latest POSIX draft, the specification has been changed to
      match traditional Unix's behavior (which is bizarre, but it's too
      late to fix this).  Please note that the same problem does arise
      when the empty string results from a computation, as in:
 
           expr bar : foo \| foo : bar
 
      Avoid this portability problem by avoiding the empty string.
 
 `expr' (`:')
      Don't use `\?', `\+' and `\|' in patterns, they are not supported
      on Solaris.
 
      The POSIX.2-1992 standard is ambiguous as to whether `expr a : b'
      (and `expr 'a' : '\(b\)'') output `0' or the empty string.  In
      practice, it outputs the empty string on most platforms, but
      portable scripts should not assume this.  For instance, the QNX
      4.25 native `expr' returns `0'.
 
      You may believe that one means to get a uniform behavior would be
      to use the empty string as a default value:
 
           expr a : b \| ''
 
      unfortunately this behaves exactly as the original expression, see
      the ``expr' (`:')' entry for more information.
 
      Older `expr' implementations (e.g., SunOS 4 `expr' and Solaris 8
      `/usr/ucb/expr') have a silly length limit that causes `expr' to
      fail if the matched substring is longer than 120 bytes.  In this
      case, you might want to fall back on `echo|sed' if `expr' fails.
 
      Don't leave, there is some more!
 
      The QNX 4.25 `expr', in addition of preferring `0' to the empty
      string, has a funny behavior in its exit status: it's always 1
      when parentheses are used!
 
           $ val=`expr 'a' : 'a'`; echo "$?: $val"
           0: 1
           $ val=`expr 'a' : 'b'`; echo "$?: $val"
           1: 0
           
           $ val=`expr 'a' : '\(a\)'`; echo "?: $val"
           1: a
           $ val=`expr 'a' : '\(b\)'`; echo "?: $val"
           1: 0
 
      In practice this can be a big problem if you are ready to catch
      failures of `expr' programs with some other method (such as using
      `sed'), since you may get twice the result.  For instance
 
           $ expr 'a' : '\(a\)' || echo 'a' | sed 's/^\(a\)$/\1/'
 
      will output `a' on most hosts, but `aa' on QNX 4.25.  A simple
      workaround consists in testing `expr' and use a variable set to
      `expr' or to `false' according to the result.
 
 `fgrep'
      POSIX 1003.1-2001 no longer requires `fgrep', but many older hosts
      do not yet support the POSIX replacement `grep -F'.  To work
      around this problem, invoke `AC_PROG_FGREP' and then use `$FGREP'.
 
 `find'
      The option `-maxdepth' seems to be GNU specific.  Tru64 v5.1,
      NetBSD 1.5 and Solaris 2.5 `find' commands do not understand it.
 
      The replacement of `{}' is guaranteed only if the argument is
      exactly _{}_, not if it's only a part of an argument.  For
      instance on DU, and HP-UX 10.20 and HP-UX 11:
 
           $ touch foo
           $ find . -name foo -exec echo "{}-{}" \;
           {}-{}
 
      while GNU `find' reports `./foo-./foo'.
 
 `grep'
      Don't use `grep -s' to suppress output, because `grep -s' on
      System V does not suppress output, only error messages.  Instead,
      redirect the standard output and standard error (in case the file
      doesn't exist) of `grep' to `/dev/null'.  Check the exit status of
      `grep' to determine whether it found a match.
 
      Don't use multiple regexps with `-e', as some `grep' will only
      honor the last pattern (e.g., IRIX 6.5 and Solaris 2.5.1).  Anyway,
      Stardent Vistra SVR4 `grep' lacks `-e'...  Instead, use extended
      regular expressions and alternation.
 
 `ln'
      Don't rely on `ln' having a `-f' option.  Symbolic links are not
      available on old systems; use `$(LN_S)' as a portable substitute.
 
      For versions of the DJGPP before 2.04, `ln' emulates soft links to
      executables by generating a stub that in turn calls the real
      program.  This feature also works with nonexistent files like in
      the Unix spec.  So `ln -s file link' will generate `link.exe',
      which will attempt to call `file.exe' if run.  But this feature
      only works for executables, so `cp -p' is used instead for these
      systems.  DJGPP versions 2.04 and later have full symlink support.
 
 `ls'
      The portable options are `-acdilrtu'.  Modern practice is for `-l'
      to output both owner and group, but traditional `ls' omits the
      group.
 
      Modern practice is for all diagnostics to go to standard error, but
      traditional `ls foo' prints the message `foo not found' to
      standard output if `foo' does not exist.  Be careful when writing
      shell commands like `sources=`ls *.c 2>/dev/null`', since with
      traditional `ls' this is equivalent to `sources="*.c not found"'
      if there are no `.c' files.
 
 `mkdir'
      None of `mkdir''s options are portable.  Instead of `mkdir -p
      FILENAME', you should use use `AS_MKDIR_P(FILENAME)' (
      Programming in M4sh).
 
 `mv'
      The only portable options are `-f' and `-i'.
 
      Moving individual files between file systems is portable (it was
      in V6), but it is not always atomic: when doing `mv new existing',
      there's a critical section where neither the old nor the new
      version of `existing' actually exists.
 
      Moving directories across mount points is not portable, use `cp'
      and `rm'.
 
      Moving/Deleting open files isn't portable.  The following can't be
      done on DOS/WIN32:
 
           exec > foo
           mv foo bar
 
      nor can
 
           exec > foo
           rm -f foo
 
 `sed'
      Patterns should not include the separator (unless escaped), even
      as part of a character class.  In conformance with POSIX, the Cray
      `sed' will reject `s/[^/]*$//': use `s,[^/]*$,,'.
 
      Sed scripts should not use branch labels longer than 8 characters
      and should not contain comments.
 
      Don't include extra `;', as some `sed', such as NetBSD 1.4.2's,
      try to interpret the second as a command:
 
           $ echo a | sed 's/x/x/;;s/x/x/'
           sed: 1: "s/x/x/;;s/x/x/": invalid command code ;
 
      Input should have reasonably long lines, since some `sed' have an
      input buffer limited to 4000 bytes.
 
      Alternation, `\|', is common but POSIX.2 does not require its
      support, so it should be avoided in portable scripts.  Solaris 8
      `sed' does not support alternation; e.g., `sed '/a\|b/d'' deletes
      only lines that contain the literal string `a|b'.
 
      Anchors (`^' and `$') inside groups are not portable.
 
      Nested parenthesization in patterns (e.g., `\(\(a*\)b*)\)') is
      quite portable to modern hosts, but is not supported by some older
      `sed' implementations like SVR3.
 
      Of course the option `-e' is portable, but it is not needed.  No
      valid Sed program can start with a dash, so it does not help
      disambiguating.  Its sole usefulness is to help enforcing
      indentation as in:
 
           sed -e INSTRUCTION-1 \
               -e INSTRUCTION-2
 
      as opposed to
 
           sed INSTRUCTION-1;INSTRUCTION-2
 
      Contrary to yet another urban legend, you may portably use `&' in
      the replacement part of the `s' command to mean "what was
      matched".  All descendants of Bell Lab's V7 `sed' (at least; we
      don't have first hand experience with older `sed's) have supported
      it.
 
      POSIX requires that you must not have any white space between `!'
      and the following command.  It is OK to have blanks between the
      address and the `!'.  For instance, on Solaris 8:
 
           $ echo "foo" | sed -n '/bar/ ! p'
           error-->Unrecognized command: /bar/ ! p
           $ echo "foo" | sed -n '/bar/! p'
           error-->Unrecognized command: /bar/! p
           $ echo "foo" | sed -n '/bar/ !p'
           foo
 
 `sed' (`t')
      Some old systems have `sed' that "forget" to reset their `t' flag
      when starting a new cycle.  For instance on MIPS RISC/OS, and on
      IRIX 5.3, if you run the following `sed' script (the line numbers
      are not actual part of the texts):
 
           s/keep me/kept/g  # a
           t end             # b
           s/.*/deleted/g    # c
           : end             # d
 
      on
 
           delete me         # 1
           delete me         # 2
           keep me           # 3
           delete me         # 4
 
      you get
 
           deleted
           delete me
           kept
           deleted
 
      instead of
 
           deleted
           deleted
           kept
           deleted
 
      Why?  When processing 1, a matches, therefore sets the t flag, b
      jumps to d, and the output is produced.  When processing line 2,
      the t flag is still set (this is the bug).  Line a fails to match,
      but `sed' is not supposed to clear the t flag when a substitution
      fails.  Line b sees that the flag is set, therefore it clears it,
      and jumps to d, hence you get `delete me' instead of `deleted'.
      When processing 3, t is clear, a matches, so the flag is set,
      hence b clears the flags and jumps.  Finally, since the flag is
      clear, 4 is processed properly.
 
      There are two things one should remember about `t' in `sed'.
      Firstly, always remember that `t' jumps if _some_ substitution
      succeeded, not only the immediately preceding substitution.
      Therefore, always use a fake `t clear; : clear' to reset the t
      flag where indeed.
 
      Secondly, you cannot rely on `sed' to clear the flag at each new
      cycle.
 
      One portable implementation of the script above is:
 
           t clear
           : clear
           s/keep me/kept/g
           t end
           s/.*/deleted/g
           : end
 
 `touch'
      On some old BSD systems, `touch' or any command that results in an
      empty file does not update the timestamps, so use a command like
      `echo' as a workaround.
 
      GNU `touch' 3.16r (and presumably all before that) fails to work
      on SunOS 4.1.3 when the empty file is on an NFS-mounted 4.2 volume.
 
Info Catalog (autoconf.info.gz) Limitations of Builtins (autoconf.info.gz) Portable Shell (autoconf.info.gz) Limitations of Make
automatically generated byinfo2html