DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH
 

(gawk.info.gz) Ordinal Functions

Info Catalog (gawk.info.gz) Cliff Random Function (gawk.info.gz) General Functions (gawk.info.gz) Join Function
 
 Translating Between Characters and Numbers
 ------------------------------------------
 
    One commercial implementation of `awk' supplies a built-in function,
 `ord', which takes a character and returns the numeric value for that
 character in the machine's character set.  If the string passed to
 `ord' has more than one character, only the first one is used.
 
    The inverse of this function is `chr' (from the function of the same
 name in Pascal), which takes a number and returns the corresponding
 character.  Both functions are written very nicely in `awk'; there is
 no real reason to build them into the `awk' interpreter:
 
      # ord.awk --- do ord and chr
      
      # Global identifiers:
      #    _ord_:        numerical values indexed by characters
      #    _ord_init:    function to initialize _ord_
      BEGIN    { _ord_init() }
      
      function _ord_init(    low, high, i, t)
      {
          low = sprintf("%c", 7) # BEL is ascii 7
          if (low == "\a") {    # regular ascii
              low = 0
              high = 127
          } else if (sprintf("%c", 128 + 7) == "\a") {
              # ascii, mark parity
              low = 128
              high = 255
          } else {        # ebcdic(!)
              low = 0
              high = 255
          }
      
          for (i = low; i <= high; i++) {
              t = sprintf("%c", i)
              _ord_[t] = i
          }
      }
 
    Some explanation of the numbers used by `chr' is worthwhile.  The
 most prominent character set in use today is ASCII. Although an 8-bit
 byte can hold 256 distinct values (from 0 to 255), ASCII only defines
 characters that use the values from 0 to 127.(1) In the now distant
 past, at least one minicomputer manufacturer used ASCII, but with mark
 parity, meaning that the leftmost bit in the byte is always 1.  This
 means that on those systems, characters have numeric values from 128 to
 255.  Finally, large mainframe systems use the EBCDIC character set,
 which uses all 256 values.  While there are other character sets in use
 on some older systems, they are not really worth worrying about:
 
      function ord(str,    c)
      {
          # only first character is of interest
          c = substr(str, 1, 1)
          return _ord_[c]
      }
      
      function chr(c)
      {
          # force c to be numeric by adding 0
          return sprintf("%c", c + 0)
      }
      
      #### test code ####
      # BEGIN    \
      # {
      #    for (;;) {
      #        printf("enter a character: ")
      #        if (getline var <= 0)
      #            break
      #        printf("ord(%s) = %d\n", var, ord(var))
      #    }
      # }
 
    An obvious improvement to these functions is to move the code for the
 `_ord_init' function into the body of the `BEGIN' rule.  It was written
 this way initially for ease of development.  There is a "test program"
 in a `BEGIN' rule, to test the function.  It is commented out for
 production use.
 
    ---------- Footnotes ----------
 
    (1) ASCII has been extended in many countries to use the values from
 128 to 255 for country-specific characters.  If your  system uses these
 extensions, you can simplify `_ord_init' to simply loop from 0 to 255.
 
Info Catalog (gawk.info.gz) Cliff Random Function (gawk.info.gz) General Functions (gawk.info.gz) Join Function
automatically generated byinfo2html