DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH
 

(tar) Blocking Factor

Info Catalog (tar) Format Variations (tar) Blocking
 
 The Blocking Factor of an Archive
 ---------------------------------
 
      _(This message will disappear, once this node revised.)_
 
    The data in an archive is grouped into blocks, which are 512 bytes.
 Blocks are read and written in whole number multiples called "records".
 The number of blocks in a record (ie. the size of a record in units of
 512 bytes) is called the "blocking factor".  The
 `--blocking-factor=512-SIZE' (`-b 512-SIZE') option specifies the
 blocking factor of an archive.  The default blocking factor is
 typically 20 (ie.  10240 bytes), but can be specified at installation.
 To find out the blocking factor of an existing archive, use `tar --list
 --file=ARCHIVE-NAME'.  This may not work on some devices.
 
    Records are separated by gaps, which waste space on the archive
 media.  If you are archiving on magnetic tape, using a larger blocking
 factor (and therefore larger records) provides faster throughput and
 allows you to fit more data on a tape (because there are fewer gaps).
 If you are archiving on cartridge, a very large blocking factor (say
 126 or more) greatly increases performance. A smaller blocking factor,
 on the other hand, may be useful when archiving small files, to avoid
 archiving lots of nulls as `tar' fills out the archive to the end of
 the record.  In general, the ideal record size depends on the size of
 the inter-record gaps on the tape you are using, and the average size
 of the files you are archiving.   create, for information on
 writing archives.
 
    Archives with blocking factors larger than 20 cannot be read by very
 old versions of `tar', or by some newer versions of `tar' running on
 old machines with small address spaces.  With GNU `tar', the blocking
 factor of an archive is limited only by the maximum record size of the
 device containing the archive, or by the amount of available virtual
 memory.
 
    Also, on some systems, not using adequate blocking factors, as
 sometimes imposed by the device drivers, may yield unexpected
 diagnostics.  For example, this has been reported:
 
      Cannot write to /dev/dlt: Invalid argument
 
 In such cases, it sometimes happen that the `tar' bundled by the system
 is aware of block size idiosyncrasies, while GNU `tar' requires an
 explicit specification for the block size, which it cannot guess.  This
 yields some people to consider GNU `tar' is misbehaving, because by
 comparison, `the bundle `tar' works OK'.  Adding `-b 256', for example,
 might resolve the problem.
 
    If you use a non-default blocking factor when you create an archive,
 you must specify the same blocking factor when you modify that archive.
 Some archive devices will also require you to specify the blocking
 factor when reading that archive, however this is not typically the
 case.  Usually, you can use `--list' (`-t') without specifying a
 blocking factor--`tar' reports a non-default record size and then lists
 the archive members as it would normally.  To extract files from an
 archive with a non-standard blocking factor (particularly if you're not
 sure what the blocking factor is), you can usually use the
 `--read-full-records' (`-B') option while specifying a blocking factor
 larger then the blocking factor of the archive (ie. `tar --extract
 --read-full-records --blocking-factor=300'.   list, for more
 information on the `--list' (`-t') operation.   Reading, for a
 more detailed explanation of that option.
 
 `--blocking-factor=NUMBER'
 `-b NUMBER'
      Specifies the blocking factor of an archive.  Can be used with any
      operation, but is usually not necessary with `--list' (`-t').
 
    Device blocking
 
 `-b BLOCKS'
 `--blocking-factor=BLOCKS'
      Set record size to BLOCKS * 512 bytes.
 
      This option is used to specify a "blocking factor" for the archive.
      When reading or writing the archive, `tar', will do reads and
      writes of the archive in records of BLOCK*512 bytes.  This is true
      even when the archive is compressed.  Some devices requires that
      all write operations be a multiple of a certain size, and so, `tar'
      pads the archive out to the next record boundary.
 
      The default blocking factor is set when `tar' is compiled, and is
      typically 20.  Blocking factors larger than 20 cannot be read by
      very old versions of `tar', or by some newer versions of `tar'
      running on old machines with small address spaces.
 
      With a magnetic tape, larger records give faster throughput and fit
      more data on a tape (because there are fewer inter-record gaps).
      If the archive is in a disk file or a pipe, you may want to specify
      a smaller blocking factor, since a large one will result in a large
      number of null bytes at the end of the archive.
 
      When writing cartridge or other streaming tapes, a much larger
      blocking factor (say 126 or more) will greatly increase
      performance.  However, you must specify the same blocking factor
      when reading or updating the archive.
 
      Apparently, Exabyte drives have a physical block size of 8K bytes.
      If we choose our blocksize as a multiple of 8k bytes, then the
      problem seems to dissapper.  Id est, we are using block size of
      112 right now, and we haven't had the problem since we switched...
 
      With GNU `tar' the blocking factor is limited only by the maximum
      record size of the device containing the archive, or by the amount
      of available virtual memory.
 
      However, deblocking or reblocking is virtually avoided in a special
      case which often occurs in practice, but which requires all the
      following conditions to be simultaneously true:
         * the archive is subject to a compression option,
 
         * the archive is not handled through standard input or output,
           nor redirected nor piped,
 
         * the archive is directly handled to a local disk, instead of
           any special device,
 
         * `--blocking-factor=512-SIZE' (`-b 512-SIZE') is not
           explicitly specified on the `tar' invocation.
 
      In previous versions of GNU `tar', the `--compress-block' option
      (or even older: `--block-compress') was necessary to reblock
      compressed archives.  It is now a dummy option just asking not to
      be used, and otherwise ignored.  If the output goes directly to a
      local disk, and not through stdout, then the last write is not
      extended to a full record size.  Otherwise, reblocking occurs.
      Here are a few other remarks on this topic:
 
         * `gzip' will complain about trailing garbage if asked to
           uncompress a compressed archive on tape, there is an option
           to turn the message off, but it breaks the regularity of
           simply having to use `PROG -d' for decompression.  It would
           be nice if gzip was silently ignoring any number of trailing
           zeros.  I'll ask Jean-loup Gailly, by sending a copy of this
           message to him.
 
         * `compress' does not show this problem, but as Jean-loup
           pointed out to Michael, `compress -d' silently adds garbage
           after the result of decompression, which tar ignores because
           it already recognized its end-of-file indicator.  So this bug
           may be safely ignored.
 
         * `gzip -d -q' will be silent about the trailing zeros indeed,
           but will still return an exit status of 2 which tar reports
           in turn.  `tar' might ignore the exit status returned, but I
           hate doing that, as it weakens the protection `tar' offers
           users against other possible problems at decompression time.
           If `gzip' was silently skipping trailing zeros _and_ also
           avoiding setting the exit status in this innocuous case, that
           would solve this situation.
 
         * `tar' should become more solid at not stopping to read a pipe
           at the first null block encountered.  This inelegantly breaks
           the pipe.  `tar' should rather drain the pipe out before
           exiting itself.
 
 `-i'
 `--ignore-zeros'
      Ignore blocks of zeros in archive (means EOF).
 
      The `--ignore-zeros' (`-i') option causes `tar' to ignore blocks
      of zeros in the archive.  Normally a block of zeros indicates the
      end of the archive, but when reading a damaged archive, or one
      which was created by concatenating several archives together, this
      option allows `tar' to read the entire archive.  This option is
      not on by default because many versions of `tar' write garbage
      after the zeroed blocks.
 
      Note that this option causes `tar' to read to the end of the
      archive file, which may sometimes avoid problems when multiple
      files are stored on a single physical tape.
 
 `-B'
 `--read-full-records'
      Reblock as we read (for reading 4.2BSD pipes).
 
      If `--read-full-records' (`-B') is used, `tar' will not panic if an
      attempt to read a record from the archive does not return a full
      record.  Instead, `tar' will keep reading until it has obtained a
      full record.
 
      This option is turned on by default when `tar' is reading an
      archive from standard input, or from a remote machine.  This is
      because on BSD Unix systems, a read of a pipe will return however
      much happens to be in the pipe, even if it is less than `tar'
      requested.  If this option was not used, `tar' would fail as soon
      as it read an incomplete record from the pipe.
 
      This option is also useful with the commands for updating an
      archive.
 
    Tape blocking
 
    When handling various tapes or cartridges, you have to take care of
 selecting a proper blocking, that is, the number of disk blocks you put
 together as a single tape block on the tape, without intervening tape
 gaps.  A "tape gap" is a small landing area on the tape with no
 information on it, used for decelerating the tape to a full stop, and
 for later regaining the reading or writing speed.  When the tape driver
 starts reading a record, the record has to be read whole without
 stopping, as a tape gap is needed to stop the tape motion without
 loosing information.
 
    Using higher blocking (putting more disk blocks per tape block) will
 use the tape more efficiently as there will be less tape gaps.  But
 reading such tapes may be more difficult for the system, as more memory
 will be required to receive at once the whole record.  Further, if
 there is a reading error on a huge record, this is less likely that the
 system will succeed in recovering the information.  So, blocking should
 not be too low, nor it should be too high.  `tar' uses by default a
 blocking of 20 for historical reasons, and it does not really matter
 when reading or writing to disk.  Current tape technology would easily
 accommodate higher blockings.  Sun recommends a blocking of 126 for
 Exabytes and 96 for DATs.  We were told that for some DLT drives, the
 blocking should be a multiple of 4Kb, preferably 64Kb (`-b 128') or 256
 for decent performance.  Other manufacturers may use different
 recommendations for the same tapes.  This might also depends of the
 buffering techniques used inside modern tape controllers.  Some imposes
 a minimum blocking, or a maximum blocking.  Others request blocking to
 be some exponent of two.
 
    So, there is no fixed rule for blocking.  But blocking at read time
 should ideally be the same as blocking used at write time.  At one place
 I know, with a wide variety of equipment, they found it best to use a
 blocking of 32 to guarantee that their tapes are fully interchangeable.
 
    I was also told that, for recycled tapes, prior erasure (by the same
 drive unit that will be used to create the archives) sometimes lowers
 the error rates observed at rewriting time.
 
    I might also use `--number-blocks' instead of `--block-number', so
 `--block' will then expand to `--blocking-factor' unambiguously.
 
Info Catalog (tar) Format Variations (tar) Blocking
automatically generated byinfo2html