DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH
 

(tar) Blocking

Info Catalog (tar) Common Problems and Solutions (tar) Media (tar) Many
 
 Blocking
 ========
 
      _(This message will disappear, once this node revised.)_
 
    "Block" and "record" terminology is rather confused, and it is also
 confusing to the expert reader.  On the other hand, readers who are new
 to the field have a fresh mind, and they may safely skip the next two
 paragraphs, as the remainder of this manual uses those two terms in a
 quite consistent way.
 
    John Gilmore, the writer of the public domain `tar' from which GNU
 `tar' was originally derived, wrote (June 1995):
 
      The nomenclature of tape drives comes from IBM, where I believe
      they were invented for the IBM 650 or so.  On IBM mainframes, what
      is recorded on tape are tape blocks.  The logical organization of
      data is into records.  There are various ways of putting records
      into blocks, including `F' (fixed sized records), `V' (variable
      sized records), `FB' (fixed blocked: fixed size records, N to a
      block), `VB' (variable size records, N to a block), `VSB'
      (variable spanned blocked: variable sized records that can occupy
      more than one block), etc.  The `JCL' `DD RECFORM=' parameter
      specified this to the operating system.
 
      The Unix man page on `tar' was totally confused about this.  When
      I wrote `PD TAR', I used the historically correct terminology
      (`tar' writes data records, which are grouped into blocks).  It
      appears that the bogus terminology made it into POSIX (no surprise
      here), and now Franc,ois has migrated that terminology back into
      the source code too.
 
    The term "physical block" means the basic transfer chunk from or to
 a device, after which reading or writing may stop without anything
 being lost.  In this manual, the term "block" usually refers to a disk
 physical block, _assuming_ that each disk block is 512 bytes in length.
 It is true that some disk devices have different physical blocks, but
 `tar' ignore these differences in its own format, which is meant to be
 portable, so a `tar' block is always 512 bytes in length, and "block"
 always mean a `tar' block.  The term "logical block" often represents
 the basic chunk of allocation of many disk blocks as a single entity,
 which the operating system treats somewhat atomically; this concept is
 only barely used in GNU `tar'.
 
    The term "physical record" is another way to speak of a physical
 block, those two terms are somewhat interchangeable.  In this manual,
 the term "record" usually refers to a tape physical block, _assuming_
 that the `tar' archive is kept on magnetic tape.  It is true that
 archives may be put on disk or used with pipes, but nevertheless, `tar'
 tries to read and write the archive one "record" at a time, whatever
 the medium in use.  One record is made up of an integral number of
 blocks, and this operation of putting many disk blocks into a single
 tape block is called "reblocking", or more simply, "blocking".  The
 term "logical record" refers to the logical organization of many
 characters into something meaningful to the application.  The term
 "unit record" describes a small set of characters which are transmitted
 whole to or by the application, and often refers to a line of text.
 Those two last terms are unrelated to what we call a "record" in GNU
 `tar'.
 
    When writing to tapes, `tar' writes the contents of the archive in
 chunks known as "records".  To change the default blocking factor, use
 the `--blocking-factor=512-SIZE' (`-b 512-SIZE') option.  Each record
 will then be composed of 512-SIZE blocks.  (Each `tar' block is 512
 bytes.   Standard.)  Each file written to the archive uses at
 least one full record.  As a result, using a larger record size can
 result in more wasted space for small files.  On the other hand, a
 larger record size can often be read and written much more efficiently.
 
    Further complicating the problem is that some tape drives ignore the
 blocking entirely.  For these, a larger record size can still improve
 performance (because the software layers above the tape drive still
 honor the blocking), but not as dramatically as on tape drives that
 honor blocking.
 
    When reading an archive, `tar' can usually figure out the record
 size on itself.  When this is the case, and a non-standard record size
 was used when the archive was created, `tar' will print a message about
 a non-standard blocking factor, and then operate normally.  On some
 tape devices, however, `tar' cannot figure out the record size itself.
 On most of those, you can specify a blocking factor (with
 `--blocking-factor=512-SIZE' (`-b 512-SIZE')) larger than the actual
 blocking factor, and then use the `--read-full-records' (`-B') option.
 (If you specify a blocking factor with `--blocking-factor=512-SIZE'
 (`-b 512-SIZE') and don't use the `--read-full-records' (`-B') option,
 then `tar' will not attempt to figure out the recording size itself.)
 On some devices, you must always specify the record size exactly with
 `--blocking-factor=512-SIZE' (`-b 512-SIZE') when reading, because
 `tar' cannot figure it out.  In any case, use `--list' (`-t') before
 doing any extractions to see whether `tar' is reading the archive
 correctly.
 
    `tar' blocks are all fixed size (512 bytes), and its scheme for
 putting them into records is to put a whole number of them (one or
 more) into each record.  `tar' records are all the same size; at the
 end of the file there's a block containing all zeros, which is how you
 tell that the remainder of the last record(s) are garbage.
 
    In a standard `tar' file (no options), the block size is 512 and the
 record size is 10240, for a blocking factor of 20.  What the
 `--blocking-factor=512-SIZE' (`-b 512-SIZE') option does is sets the
 blocking factor, changing the record size while leaving the block size
 at 512 bytes.  20 was fine for ancient 800 or 1600 bpi reel-to-reel
 tape drives; most tape drives these days prefer much bigger records in
 order to stream and not waste tape.  When writing tapes for myself,
 some tend to use a factor of the order of 2048, say, giving a record
 size of around one megabyte.
 
    If you use a blocking factor larger than 20, older `tar' programs
 might not be able to read the archive, so we recommend this as a limit
 to use in practice.  GNU `tar', however, will support arbitrarily large
 record sizes, limited only by the amount of virtual memory or the
 physical characteristics of the tape device.
 

Menu

 
* Format Variations           Format Variations
* Blocking Factor             The Blocking Factor of an Archive
 
Info Catalog (tar) Common Problems and Solutions (tar) Media (tar) Many
automatically generated byinfo2html