xz command examples

xz command examples

xz, unxz, xzcat, lzma, unlzma, lzcat – Compress or decompress .xz and .lzma files

Basics

Compress the file foo into foo.xz using the default compression level (-6), and remove foo if compression is successful:

xz foo

Decompress bar.xz into bar and don’t remove bar.xz even if decompression is successful:

xz -dk bar.xz

Create baz.tar.xz with the preset -4e (-4 –extreme), which is slower than e.g. the default -6, but needs less memory for compression and decompression (48 MiB and 5 MiB, respectively):

tar cf - baz | xz -4e > baz.tar.xz

A mix of compressed and uncompressed files can be decompressed to standard output with a single command:

xz -dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt

Parallel compression of many files

On GNU and *BSD, find(1) and xargs(1) can be used to parallelize compression of many files:

find . -type f \! -name '*.xz' -print0 | xargs -0r -P4 -n16 xz -T1

The -P option to xargs(1) sets the number of parallel xz processes. The best value for the -n option depends on how many files there are to be compressed. If there are only a couple of files, the value should probably be 1; with tens of thousands of files, 100 or even more may be appropriate to reduce the number of xz processes that xargs(1) will eventually create.

The option -T1 for xz is there to force it to single-threaded mode, because xargs(1) is used to control the amount of parallelization.

Robot mode

Calculate how many bytes have been saved in total after compressing multiple files:

xz --robot --list *.xz | awk '/^totals/{print $5-$4}'

A script may want to know that it is using new enough xz. The following sh(1) script checks that the version number of the xz tool is at least 5.0.0. This method is compatible with old beta versions, which didn’t support the –robot option:

if ! eval "$(xz --robot --version 2> /dev/null)" ||
[ "$XZ_VERSION" -lt 50000002 ]; then
echo "Your xz is too old."
fi
unset XZ_VERSION LIBLZMA_VERSION

Set a memory usage limit for decompression using XZ_OPT, but if a limit has already been set, don’t increase it:

NEWLIM=$((123 << 20)) # 123 MiB
OLDLIM=$(xz --robot --info-memory | cut -f3)
if [ $OLDLIM -eq 0 -o $OLDLIM -gt $NEWLIM ]; then
XZ_OPT="$XZ_OPT --memlimit-decompress=$NEWLIM"
export XZ_OPT
fi

Custom compressor filter chains

The simplest use for custom filter chains is customizing a LZMA2 preset. This can be useful, because the presets cover only a subset of the potentially useful combinations of compression settings.

The CompCPU columns of the tables from the descriptions of the options -0 … -9 and –extreme are useful when customizing LZMA2 presets. Here are the relevant parts collected from those two tables:

Preset CompCPU
-0 0
-1 1
-2 2
-3 3
-4 4
-5 5
-6 6
-5e 7
-6e 8

If you know that a file requires somewhat big dictionary (e.g. 32 MiB) to compress well, but you want to compress it quicker than xz -8 would do, a preset with a low CompCPU value (e.g. 1) can be modified to use a bigger dictionary:

xz --lzma2=preset=1,dict=32MiB foo.tar

With certain files, the above command may be faster than xz -6 while compressing significantly better. However, it must be emphasized that only some files benefit from a big dictionary while keeping the CompCPU value low. The most obvious situation, where a big dictionary can help a lot, is an archive containing very similar files of at least a few megabytes each. The dictionary size has to be significantly bigger than any individual file to allow LZMA2 to take full advantage of the similarities between consecutive files.

If very high compressor and decompressor memory usage is fine, and the file being compressed is at least several hundred megabytes, it may be useful to use an even bigger dictionary than the 64 MiB that xz -9 would use:

xz -vv --lzma2=dict=192MiB big_foo.tar

Using -vv (–verbose –verbose) like in the above example can be useful to see the memory requirements of the compressor and decompressor. Remember that using a dictionary bigger than the size of the uncompressed file is waste of memory, so the above command isn’t useful for small files.

Sometimes the compression time doesn’t matter, but the decompressor memory usage has to be kept low e.g. to make it possible to decompress the file on an embedded system. The following command uses -6e (-6 –extreme) as a base and sets the dictionary to only 64 KiB. The resulting file can be decompressed with XZ Embedded (that’s why there is –check=crc32) using about 100 KiB of memory.

xz --check=crc32 --lzma2=preset=6e,dict=64KiB foo

If you want to squeeze out as many bytes as possible, adjusting the number of literal context bits (lc) and numberĀ of position bits (pb) can sometimes help. Adjusting the number of literal position bits (lp) might help too, but usually lc and pb are more important. E.g. a source code archive contains mostly US-ASCII text, so something like the following might give slightly (like 0.1 %) smaller file than xz -6e (try also without lc=4):

xz --lzma2=preset=6e,pb=0,lc=4 source_code.tar

Using another filter together with LZMA2 can improve compression with certain file types. E.g. to compress a x86-32 or x86-64 shared library using the x86 BCJ filter:

xz --x86 --lzma2 libfoo.so

Note that the order of the filter options is significant. If –x86 is specified after –lzma2, xz will give an error, because there cannot be any filter after LZMA2, and also because the x86 BCJ filter cannot be used as the last filter in the chain.

The Delta filter together with LZMA2 can give good results with bitmap images. It should usually beat PNG, which has a few more advanced filters than simple delta but uses Deflate for the actual compression.

The image has to be saved in uncompressed format, e.g. as uncompressed TIFF. The distance parameter of the Delta filter is set to match the number of bytes per pixel in the image. E.g. 24-bit RGB bitmap needs dist=3, and it is also good to pass pb=0 to LZMA2 to accommodate the three-byte alignment:

xz --delta=dist=3 --lzma2=pb=0 foo.tiff

If multiple images have been put into a single archive (e.g. .tar), the Delta filter will work on that too as long as all images have the same number of bytes per pixel.

Leave a Reply

Your email address will not be published. Required fields are marked *