[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.1.2 Archiving Sparse Files

Files in the file system occasionally have holes. A hole in a file is a section of the file's contents which was never written. The contents of a hole reads as all zeros. On many operating systems, actual disk storage is not allocated for holes, but they are counted in the length of the file. If you archive such a file, tar could create an archive longer than the original. To have tar attempt to recognize the holes in a file, use `--sparse' (`-S'). When you use this option, then, for any file using less disk space than would be expected from its length, tar searches the file for consecutive stretches of zeros. It then records in the archive for the file where the consecutive stretches of zeros are, and only archives the "real contents" of the file. On extraction (using `--sparse' is not needed on extraction) any such files have holes created wherever the continuous stretches of zeros were found. Thus, if you use `--sparse', tar archives won't take more space than the original.

`-S'
`--sparse'

This option instructs tar to test each file for sparseness before attempting to archive it. If the file is found to be sparse it is treated specially, thus allowing to decrease the amount of space used by its image in the archive.

This option is meaningful only when creating or updating archives. It has no effect on extraction.

Consider using `--sparse' when performing file system backups, to avoid archiving the expanded forms of files stored sparsely in the system.

Even if your system has no sparse files currently, some may be created in the future. If you use `--sparse' while making file system backups as a matter of course, you can be assured the archive will never take more space on the media than the files take on disk (otherwise, archiving a disk filled with sparse files might take hundreds of tapes). See section Using tar to Perform Incremental Dumps.

However, be aware that `--sparse' option presents a serious drawback. Namely, in order to determine if the file is sparse tar has to read it before trying to archive it, so in total the file is read twice. So, always bear in mind that the time needed to process all files with this option is roughly twice the time needed to archive them without it. See A technical note:

Programs like dump do not have to read the entire file; by examining the file system directly, they can determine in advance exactly where the holes are and thus avoid reading through them. The only data it need read are the actual allocated data blocks. GNU tar uses a more portable and straightforward archiving approach, it would be fairly difficult that it does otherwise. Elizabeth Zwicky writes to `comp.unix.internals', on 1990-12-10:

What I did say is that you cannot tell the difference between a hole and an equivalent number of nulls without reading raw blocks. st_blocks at best tells you how many holes there are; it doesn't tell you where. Just as programs may, conceivably, care what st_blocks is (care to name one that does?), they may also care where the holes are (I have no examples of this one either, but it's equally imaginable).

I conclude from this that good archivers are not portable. One can arguably conclude that if you want a portable program, you can in good conscience restore files with as many holes as possible, since you can't get it right.

When using `POSIX' archive format, GNU tar is able to store sparse files using in three distinct ways, called sparse formats. A sparse format is identified by its number, consisting, as usual of two decimal numbers, delimited by a dot. By default, format `1.0' is used. If, for some reason, you wish to use an earlier format, you can select it using `--sparse-version' option.

`--sparse-version=version'

Select the format to store sparse files in. Valid version values are: `0.0', `0.1' and `1.0'. See section Storing Sparse Files, for a detailed description of each format.

Using `--sparse-format' option implies `--sparse'.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated on July, 28 2014 using texi2html 1.76.