Highest Priorities For Maximizing Compression Effectiveness

Think about whether you need to use high compression settings and formats.

Most of the time, slower algorithms requiring more computing power have a higher compression ratio. For example, RAR compression is slower and more potent than ZIP compression, and 7Z compression is slower and more powerful than RAR. PAQ/ZPAQ has a higher compression ratio than other algorithms but needs more computing power.

See file compression formats comparison and compression benchmarks for comparing the best compression algorithms and how different file archiving formats affect speed and compression ratio.

Different data types can lead to different results with other data compression algorithms. For example, weaker RAR and ZIPX compression can close the gap with stronger 7Z compression when multimedia files are being compressed. This is because RAR and ZIPX use well-optimized filters for multimedia files when suitable data structures are found. However, lossy compressed multimedia files are still hard to compress.

Most of the time, using the highest compression ratios of a weaker compression algorithm is less effective than switching to a more robust algorithm.

It is recommended to carefully consider if better compression is needed (after deduplication and evaluating files that don't compress well) or if the archive is mainly made for reasons other than reducing file size, such as encrypting the content and handling it as a single file, etc. If time is of the essence, speed should be the essential thing to consider, and the fastest algorithms should be chosen, like zlib's Deflate (GZip, ZIP, Zopfli), Brotli, or Zstandard.

Recognize files that aren't compressed well

Consider if it's worth spending time to compress data that doesn't compress well or if it's better to store it "as is." Some data structures have a lot of entropy, or previous processes like encryption or compression introduce entropy. This makes it hard or even impossible to compress the data further. Instead, computing power would be better spent on reducing the size of other types of files, giving better results, and making the process go faster.

Multimedia files

Lossy compression makes it hard to compress MP3, JPG, MPEG, AVI, and DIVX files, and videos are usually very large compared to other file types (documents, applications), so it should be carefully decided if they should be compressed at all. Most file archivers have a "Store" option for a compression level, which means compression is turned off (fastest, since speed is limited by disk copy performance).

See how to optimize the compression of images for tips on how to make graphic files (JPEG, PNG, TIFF, and BMP) take up less space on your hard drive.

Few types of document

PDF, Open Office, and the new file formats for Microsoft Office 2007 and later, as well as some databases, are already compressed (usually using fast deflate-based lossless compression), so they don't usually compress well.

Archive files

7Z, RAR, and ZIP files are already compressed and can't be directly compressed (the gains will be small, if any). Still, archives can be converted (extracted to their original, non-compressed form and then recompressed) to a format with a better compression ratio.

Encrypted information

Encrypted data cannot be compressed because it is pseudo-random, and there is no "shorter way" to represent the encrypted information. It is not recommended to try to compress encrypted files. An excellent way to start defining a compression policy is to separate data that is hard to compress from other data.

Consider the benefits of solid compression.

Solid compression, which is an option for some archive formats like 7Z and RAR, can improve the final compression ratio. It works by giving the compression algorithm a larger context to reduce redundant data and show it more efficiently to save output file size.

But context information is also needed during extraction, so extraction from a solid archive (often called a "solid block") can be much slower than a non-solid archive because it takes more time to parse all the relevant context data.

7Z lets you choose the block size for solid mode operation (the algorithm uses the "window" data context) to reduce overhead. Still, this option also slightly reduces compression ratio improvements. Solid mode compression can be done in two steps with XZ, Brotli compression, Bzip2 compression, GZip compression, or ZSTD compression.

Choose carefully if the way you want to use the compressed data requires high compression or solid compression. The more often the data needs to be extracted, the more computational overhead each end user will have to deal with.

For example, software distribution would benefit significantly from maximum compression since saving bandwidth is essential, and the end-user usually only extracts the data once. However, the extra work may not be worth it if the data needs to be accessed often, in which case the fastest extraction time would be the most efficient.

Duplicate files often do not need archiving.

One obvious suggestion is to eliminate duplicate files that are the same (deduplication) so that you don't have to archive redundant data.

When you find and get rid of duplicate files before archiving, the input size decreases, which speeds up the operation and makes the final size smaller. It also makes it easier for the end-user to navigate and search in a cleaner archive. Don't eliminate duplicate files if they have to stay in their original location because a program or an automated process needs them there.

Zeroing out the free space on virtual machines and disk images eliminates data that doesn't mean anything.

The File tools submenu has a function called "Zero delete," which is used to overwrite file data or free partition space with an "all-0" stream. This fills the corresponding physical disk space with highly compressible, homegrown data.

This lets you save space when compressing disk images, either low-level physical disk snapshots for backup purposes or Virtual Machines guest virtual disks. The 1:1 exact copy of the disk content is not weighed down by leftover data on the free space area. Some disk imaging utilities and Virtual Machines players/managers have built-in compression routines, but zeroing the free space before is strongly recommended to improve the compression ratio.

Zeroing deletion also adds a basic level of security over PeaZip's "Quick delete" function, which removes the file from the filesystem. This means that the file can't be recovered from the system's recycle bin but can be recovered with undelete file utilities. Zero deletion, on the other hand, isn't meant for advanced security. Instead, you should use PeaZip's Secure delete when you need to securely and permanently delete a file or clean up free space on a volume to protect your privacy.

Influence of using self-extracting archives

Self-extracting archives are helpful because they give the end-user the proper extraction routines without installing any software. However, because the extraction module is built into the archive, it adds 10s or 100s of KB to the size of the archive, which is only a problem with minimal archives (less than 1MB), which is still a good size for an archive of a few text documents. Also, because the self-extracting archive is an executable file, some file-sharing platforms, cloud providers, and e-mail servers may block the file, preventing it from reaching the intended recipient (s).

Back to Blog