Sparse Recovery - GNU tar 1.15.92

Previous: Split Recovery, Up: Other Tars

8.3.9.2 Extracting Sparse Members

Any tar implementation will be able to extract sparse members from a PAX archive. However, the extracted files will be condensed, i.e. any zero blocks will be removed from them. When we restore such a condensed file to its original form, by adding zero bloks (or holes) back to their original locations, we call this process expanding a compressed sparse file.

To expand a file, you will need a simple auxiliary program called xsparse. It is available in source form from GNU tar home page.

Let's begin with archive members in sparse format version 1.0¹, which are the easiest to expand. The condensed file will contain both file map and file data, so no additional data will be needed to restore it. If the original file name was dir/name, then the condensed file will be named dir/GNUSparseFile.n/name, where n is a decimal number².

To expand a version 1.0 file, run xsparse as follows:

     $ xsparse cond-file

where cond-file is the name of the condensed file. The utility will deduce the name for the resulting expanded file using the following algorithm:

If cond-file does not contain any directories, ../cond-file will be used;
If cond-file has the form dir/t/name, where both t and name are simple names, with no ‘/’ characters in them, the output file name will be dir/name.
Otherwise, if cond-file has the form dir/name, the output file name will be name.

In the unlikely case when this algorithm does not suite your needs, you can explicitely specify output file name as a second argument to the command:

     $ xsparse cond-file

It is often a good idea to run xsparse in dry run mode first. In this mode, the command does not actually expand the file, but verbosely lists all actions it would be taking to do so. The dry run mode is enabled by -n command line argument:

     $ xsparse -n /home/gray/GNUSparseFile.6058/sparsefile
     Reading v.1.0 sparse map
     Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to
     `/home/gray/sparsefile'
     Finished dry run

To actually expand the file, you would run:

     $ xsparse /home/gray/GNUSparseFile.6058/sparsefile

The program behaves the same way all UNIX utilities do: it will keep quiet unless it has simething important to tell you (e.g. an error condition or something). If you wish it to produce verbose output, similar to that from the dry run mode, give it -v option:

     $ xsparse -v /home/gray/GNUSparseFile.6058/sparsefile
     Reading v.1.0 sparse map
     Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to
     `/home/gray/sparsefile'
     Done

Additionally, if your tar implementation has extracted the extended headers for this file, you can instruct xstar to use them in order to verify the integrity of the expanded file. The option -x sets the name of the extended header file to use. Continuing our example:

     $ xsparse -v -x /home/gray/PaxHeaders.6058/sparsefile \
       /home/gray/GNUSparseFile.6058/sparsefile
     Reading extended header file
     Found variable GNU.sparse.major = 1
     Found variable GNU.sparse.minor = 0
     Found variable GNU.sparse.name = sparsefile
     Found variable GNU.sparse.realsize = 217481216
     Reading v.1.0 sparse map
     Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to
     `/home/gray/sparsefile'
     Done

An extended header is a special tar archive header that precedes an archive member and contains a set of variables, describing the member properties that cannot be stored in the standard ustar header. While optional for expanding sparse version 1.0 members, use of extended headers is mandatory when expanding sparse members in older sparse formats: v.0.0 and v.0.1 (The sparse formats are described in detail in see Sparse Formats). So, for this format, the question is: how to obtain extended headers from the archive?

If you use a tar implementation that does not support PAX format, extended headers for each member will be extracted as a separate file. If we represent the member name as dir/name, then the extended header file will be named dir/PaxHeaders.n/name, where n is an integer number.

Things become more difficult if your tar implementation does support PAX headers, because in this case you will have to manually extract the headers. We recommend the following algorithm:

Consult the documentation for your tar implementation for an option that will print block numbers along with the archive listing (analogous to GNU tar's -R option). For example, star has -block-number.

Obtain the verbose listing using the ‘block number’ option, and find the position of the sparse member in question and the member immediately following it. For example, running star on our archive we obtain:

          $ star -t -v -block-number -f arc.tar
          ...
          star: Unknown extended header keyword 'GNU.sparse.size' ignored.
          star: Unknown extended header keyword 'GNU.sparse.numblocks' ignored.
          star: Unknown extended header keyword 'GNU.sparse.name' ignored.
          star: Unknown extended header keyword 'GNU.sparse.map' ignored.
          block        56:  425984 -rw-r--r--  gray/users Jun 25 14:46 2006 GNUSparseFile.28124/sparsefile
          block       897:   65391 -rw-r--r--  gray/users Jun 24 20:06 2006 README
          ...

(as usual, ignore the warnings about unknown keywords.)

Let size be the size of the sparse member, Bs be its block number and Bn be the block number of the next member. Compute:
```
          N = Bs - Bn - size/512 - 2
     
```
This number gives the size of the extended header part in tar blocks. In our example, this formula gives: 897 - 56 - 425984 / 512 - 2 = 7.
Use dd to extract the headers:
```
          dd if=archive of=hname bs=512 skip=Bs count=N
     
```
where archive is the archive name, hname is a name of the file to store the extended header in, Bs and N are computed in previous steps.
In our example, this command will be
```
          $ dd if=arc.tar of=xhdr bs=512 skip=56 count=7
     
```

Finally, you can expand the condensed file, using the obtained header:

     $ xsparse -v -x xhdr GNUSparseFile.6058/sparsefile
     Reading extended header file
     Found variable GNU.sparse.size = 217481216
     Found variable GNU.sparse.numblocks = 208
     Found variable GNU.sparse.name = sparsefile
     Found variable GNU.sparse.map = 0,2048,1050624,2048,...
     Expanding file `GNUSparseFile.28124/sparsefile' to `sparsefile'
     Done

Footnotes

[1] See PAX 1.

[2] technically speaking, n is a process ID of the tar process which created the archive (see PAX keywords).