PAX 0 - GNU tar 1.15.92

Next: PAX 1, Previous: Old GNU Format, Up: Sparse Formats

C.0.2 PAX Format, Versions 0.0 and 0.1

There are two formats available in this branch. The version 0.0 is the initial version of sparse format used by tar versions 1.14–1.15.1. The sparse file map is kept in extended (x) PAX header variables:

GNU.sparse.size: Real size of the stored file
GNU.sparse.numblocks: Number of blocks in the sparse map
GNU.sparse.offset: Offset of the data block
GNU.sparse.numbytes: Size of the data block

The latter two variables repeat for each data block, so the overall structure is like this:

     GNU.sparse.size=size
     GNU.sparse.numblocks=numblocks
     repeat numblocks times
       GNU.sparse.offset=offset
       GNU.sparse.numbytes=numbytes
     end repeat

This format presented the following two problems:

Whereas the POSIX specification allows a variable to appear multiple times in a header, it requires that only the last occurrence be meaningful. Thus, multiple ocurrences of GNU.sparse.offset and GNU.sparse.numbytes are conficting with the POSIX specs.
Attempting to extract such archives using a third-party tars results in extraction of sparse files in compressed form. If the tar implementation in question does not support POSIX format, it will also extract a file containing extension header attributes. This file can be used to expand the file to its original state. However, posix-aware tars will usually ignore the unknown variables, which makes restoring the file more difficult. See Extraction of sparse members in v.0.0 format, for the detailed description of how to restore such members using non-GNU tars.

GNU tar 1.15.2 introduced sparse format version 0.1, which attempted to solve these problems. As its predecessor, this format stores sparse map in the extended POSIX header. It retains GNU.sparse.size and GNU.sparse.numblocks variables, but instead of GNU.sparse.offset/GNU.sparse.numbytes pairs it uses a single variable:

GNU.sparse.map: Map of non-null data chunks. It is a string consisting of comma-separated values "offset,size[,offset-1,size-1...]"

To address the 2nd problem, the name field in ustar is replaced with a special name, constructed using the following pattern:

     %d/GNUSparseFile.%p/%f

The real name of the sparse file is stored in the variable GNU.sparse.name. Thus, those tar implementations that are not aware of GNU extensions will at least extract the files into separate directories, giving the user a possibility to expand it afterwards. See Extraction of sparse members in v.0.1 format, for the detailed description of how to restore such members using non-GNU tars.

The resulting GNU.sparse.map string can be very long. Although POSIX does not impose any limit on the length of a x header variable, this possibly can confuse some tars.