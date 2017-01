Demartek Shares Compression Results

Here you will find a comparison of different file compression methods and how they perform on various file types.We wanted to see how various compression methods faired against the most popular file types. To do this, we set up a server with 2 octa-core Intel Xeon E5-2690s, 192GB of RAM and 12 SSDs in a software RAID-0 with Windows Server 2008 R2, and a variety of compression applications. These compression applications were scripted to take the sample files and compress them with various settings to show a good comparison of how well they compress with different algorithms. The script recorded the compression method, the before and after sizes, and the time required to compress. Below you can see the data in close to its raw form. We have added columns for the compression ratio, which is the percentage of the original file size that the compressed file takes, as well as the time required to compress.Since certain compression methods try to compress uncompressible or already compressed data, there may be a few results that end up with a compression ratio of over 100%. This is obviously undesirable, so it’s good to know which types of data ought not to be compressed with certain methods.Tips: You can click any header column to sort by that data type. You can also shift-click to select multiple sorting patterns. To search for something, press CTRL-F on your keyboard.Note: the sorting may take a moment, as there is a great deal of data to sort through.

Summary of files compressed:

access.mdb - Microsoft Access 2003 database

aif.aif - Apple's uncompressed audio format

animatedgif.gif - A small animated GIF image

bmp.bmp - Uncompressed bitmap image

csv.csv - Comma separated value spreadsheet

divx.avi - Video in the DivX format

dv.dv - DV video standard used by many older cameras

ebook.doc - MS Word 2003 formatted eBook

ebook.docx - MS Word 2010 formatted eBook

ebook.epub - eBook in the Epub format

ebook.html - eBook in HTML format

ebook.mobi - Kindle formatted eBook

ebook.odt - OpenDocument formatted eBook

ebook.pdf - PDF formatted eBook

ebook.txt - eBook in plain text

flac.flac - The Free Lossless Audio Codec

h264.mp4 - Video encoded in the H.264 format

iometer-2006.tst - Test file generated by IOmeter 2006

iometer-2008.tst - Test file generated by IOmeter-2008

iometer-2010-fullrandom.tst - Test file generated by IOmeter-2010

iometer-2010-pseudorandom.tst - Test file generated by IOmeter-2010

iometer-2010-repeating.tst - Test file generated by IOmeter-2010

jpeg.jpg - Standard JPEG image.

jpeg2000.jp2 - Image in the JPEG2000 format

mp3.mp3 - Audio in the MP3 format

mpeg1.mpg - Video in the MPEG1 format

mpeg2-best.mpg - Video in the MPEG2 format

mysqldump.sql - SQL data from Wikipedia for MySQL databases

png.png - Image in the Portable Network Graphics format

presentation.odp - OpenDocument Presentation

presentation.ppt - MS Powerpoint 2003 presentation

presentation.pptx - MS Powerpoint 2010 presentation

rawphoto-adobe.dng - RAW image in the Adobe DNG format

rawphoto-nikon.nef - RAW image in the Nikon NEF format

redcode.r3d - 3D video in the RedCode RAW format

spreadsheet.ods - OpenDocument spreadsheet

spreadsheet.xls - MS Excel 2003 spreadsheet

spreadsheet.xlsx - MS Excel 2010 spreadsheet

sqlio.dat - Microsoft SQLIO test file

sqliosim.idx - Microsoft SQLIOSIM test file

svg.svg - Scalable Vector Graphics

uncompressed.avi - Uncompressed video

vc1.mkv - Microsoft VC1 video codec

vdbench.tst - VDBench test file

vorbis.ogg - Vorbis audio codec

vp8.webm - Google VP8/WebM video

wav.wav - Uncompressed audio

webp.webp - Google WepP image

windowsmedia.wmv - Microsoft Windows Media video

xml.xml - XML database dump from Wikipedia

xvid.avi - Xvid video codec

Summary of applications and flags:

7-Zip

Compression Flags:

-mx=#: This is the compression level. Lower numbers offer higher speed, but lower compression ratios. The range is 1-9.

-m0=: This is the compression algorithm.

-t=: This is the container format. It has no impact on the compression level.

-md=: This is the dictionary size. Larger values can improve compression, but cost a great deal of RAM.

-mfb=: This is the word size. Larger values can increase compression ratio and the compression time.

-ms=: This is the solid block size. Larger values can increase the compression ratio, but requires each block to be fully extracted in order to add or remove files.

-mmt=: This sets the multi-threading mode.

Compress.exe

Compression Flags:

-z: Use Zip compression.

-zx: Use LZX compression.

FreeArc

Compression Flags:

-m#x: This is one of the options for the compression level. Lower numbers offer higher speed, but lower compression ratios. The range is from m1-m5x. It automatically selects the algorithm it thinks is best.

-m#q: This is the option for compressing with PPMD (ppmonstr). Lower numbers offer higher speed, but lower compression ratios. The range is from m1q-m6q.

-mx: This is the option for compressing with UHARC on high settings.

-mz: This is the option for compressing with UHARC on low settings.

WinRar

Compression Flags:

-m#: This is the compression level. Lower numbers offer higher speed, but lower compression ratios. The range is from m1-m5.

-af: This is sets the archive format and subsequent compression algorithm. The options are Zip and Rar.

-s: This is tells WinRar to use solid mode. It can increase the compression ratio at the cost of speed.

-ibck: This tells WinRar to run in the background.

WinZip

Compression Flags:

-ef: This tells WinZip to compress using the fast method.

-en: This tells WinZip to compress using the normal method.

-ep: This tells WinZip to compress using the maximum (PPMD) method.

-el: This tells WinZip to compress using the maximum (LZMA) method.

-ee: This tells WinZip to compress using the maximum (enhanced deflate) method.

-el: This tells WinZip to compress using the maximum (bzip2) method.

QuickLZ

Compression Flags:

-l#: This is the compression level. Lower numbers offer higher speed, but lower compression ratios. The range is from 1-3.

-t#: This is the number of CPU threads QuickLZ will use.

-k#: This is the block size. Higher numbers can increase compression.

-B: This tells QuickLZ to disable filesystem caching on Windows.

-f: This tells QuickLZ to overwrite any existing files.

-v: This tells QuickLZ to be verbose with its output.

LZO

Compression Flags:

-#: This is the compression level. Lower numbers offer higher speed, but lower compression ratios. The range is from 1-9.

-f: Force overwrite of existing files.

-v: Be verbose.

-o: Sets the output file.

Plain old copy

Compression Flags:

-y: Copies without prompting.

NTFS

UHARC

Compression Flags:

-m#: This is the compression level. Lower numbers offer higher speed, but lower compression ratios. The range is from 1-3.

-mx: Use the PPMd compression algorithm.

-mz: Use the LZP compression algorithm.

-mr: Use the RLE compression algorithm.

-mw: Use the LZ78 compression algorithm.

-md#: Set the distionary size. Larger valuse mean better compression ratios and RAM consumption during compression. The maximum value is 32768KB. If the data stream is smaller than the chosen dictionary size, UHARC will try to use the smallest dictionary size that is larger than the data stream and is a power of 2.

-mm+: Enable multimedia detection.

7-Zip is an open-source file archiver for Windows and command line *NIX operating systems. It is fairly popular, supports most compression formats, and has created some improvements to the LZMA algorithm.Compress.exe is a simple command-line utility that was released with the Windows 2003 resource kit.FreeArc is an open-source file archiver for Windows that claims to have better compression rates than any other archiver.WinRar is a proprietary file archiver for Windows that claims to have better compression rates than any other archiver.WinZip is a proprietary file archiver for Windows.QuickLZ is an open-source compression utility that promises incredible speed. The algorithm of the same name is integrated into a few filesystems to provide automatic compression to all files.LZO is an open-source compression utility that promises incredible speed. The algorithm of the same name is integrated into a few filesystems to provide automaitc compression to all files.Since some of these compression methods are designed for speed and/or be integrated into filesystems, we included a test that just measured how long it takes to copy the respective file. The program and method will simply show up as "copy" in the compression table.This isnt an application per-se, but we wanted to test how well the built-in compression of NTFS could compress files in Windows. To do this, we copied the files into a compressed directory and read the size on disk for the file. There are no options to set. We did this three times to measure the speed of copying into, out-of and between compressed directories.UHARC is a freeware compression algorithm designed to do very good multimedia compresssion, though the source data does not have to be audio or video.

Summary of compression algorithms:

BZIP2

eXdupe

LZMA

LZMA2

LZX

PPMD

QuickLZ

RAR

UHARC

RLE

ZPAQ

BZIP2 is an open-source, CPU-intensive compression algorithm that is considerably more efficient than Zip and LZW. Compressing with Bzip is relatively slow, but decompression is resonably fast. It has a a somewhat narrow block size range of 100-900KB as compared to LZMA, but still remains a powerful compression algorithm. BZIP2 cannot be used on more than one file at a time on its own, but when combined with a container such as tar or 7z, it can be made to archive groups of files.For more info, please refer to the Wikipedia article on BZIP2.eXdupe is on open-source deduplication algorithm designed for speed. Deduplication is a simplified form of compression that finds repeating strings of data and replaces all duplicates with a pointer to the first reference. While other compression methods do things that can compress data that is not exactly repeating further in the data stream, pure deduplication has the advantage of seeking duplicates throughout the entire compression range. This is in contrast to traditional compression only seeking duplicates in the range of the dictionary and block size.For more info, please refer to the QuickLZ website LZMA is an open-source, CPU-intensive algorithm that generally performs better than BZIP2. What sets it apart is the enormous dictionary size (allowing for better compression of large, repeating data) and better contextual word handling. This results in better compression, but consumes a good deal more CPU power and RAM.For more info, please refer to the Wikipedia article on LZMA.LZMA2 is mostly LZMA with a few enhancements. It is compatible with with any LZMA decompressor. What makes it different is a better container format that allows for uncompressable data to be included without compression. This is in contrast to other algorithms that will try to compress this data anyway, resulting in a larger file than you started with. This can improve compression significantly. Additionally, LZMA2 allows for multi-threading, which can greatly increase compression speed.For more info, please refer to the Wikipedia article on LZMA.LZX is a compression algorithm that is in the same family as Zip. It is used in the compression of Microsoft Cabinet files and CHM help files.For more info, please refer to the Wikipedia article on LZX.PPMD is an open-source compression algorithm that uses stastical data compression based on context modeling and compression. It analyses data and predicts the next series with a compressed symbol.For more info, please refer to the Wikipedia article on PPMD.QuickLZ is an open-source compression algorithm that is designed for speed over compression. It has been integrated into the LessFS filesystem, and can be used with other filesystems on *NIX systems via FUSE.For more info, please refer to the QuickLZ website RAR is a proprietary archive format that competes with LZMA in terms of features. While LZMA typically outperforms RAR in terms of compressing uncompressed data, RAR has some enhancements that allow it to compress media formats with better compression ratios than other archivers.For more info, please refer to the Wikipedia article on RAR.UHARC is a freeware compression algorithm designed to do very good multimedia compresssion, though the source data does not have to be audio or video.For more info, please refer to the UHARC documentation on the code page.From Wikipedia: Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs: for example, simple graphic images such as icons, line drawings, and animations. It is not useful with files that don't have many runs as it could greatly increase the file size.For more info, please refer to the Wikipedia article on RLE.ZPAQ is an open-source, "mixed-content" compression algroithm that is similar to PPMD, but with several mathematical enhancements.For more info, please refer to the Wikipedia article on ZPAQ.