Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

A Look at Data Compression

ScuttleMonkey posted more than 7 years ago | from the it's-backup-time dept.

Software 252

With the new year fast approaching many of us look to the unenviable task of backing up last years data to make room for more of the same. That being said, rojakpot has taken a look at some of the data compression programs available and has a few insights that may help when looking for the best fit. From the article: "The best compressor of the aggregated fileset was, unsurprisingly, WinRK. It saved over 54MB more than its nearest competitor - Squeez. But both Squeez and SBC Archiver did very well, compared to the other compressors. The worst compressors were gzip and WinZip. Both compressors failed to save even 200MB of space in the aggregated results."

Sorry! There are no comments related to the filter you selected.

Speed (3, Insightful)

mysqlrocks (783488) | more than 7 years ago | (#14340361)

No talk of the speed of compression/decompression?

Re:Speed (0)

Anonymous Coward | more than 7 years ago | (#14340386)

Dude, RTFA.

Re:Speed (1)

Coneasfast (690509) | more than 7 years ago | (#14340506)

the site is almost slashdotted, very slow
secondly, why do they have to put everything on 15 different pages, does it make it more organized? i think not. easier to read the article when everything is together.

Re:Speed (5, Informative)

sshore (50665) | more than 7 years ago | (#14340861)

They do it to sell more ad impressions. Each time you go to the next page you load a new ad.

Re:Speed (1)

emmetropia (527623) | more than 7 years ago | (#14340878)

Before anyone comments, I didn't read the article. However, most likely, the reason for "15 pages" instead of one, is because they would be displaying "15 pages with ads" instead of one, which would be potentially more ad revenue.

That said, I hate it when they're broken up like that too.

Re:Speed (4, Informative)

sedmonds (94908) | more than 7 years ago | (#14340397)

Seems to be a compression speed section on page 12 - Aggregated Results. Ranging from gzip really fast, to winrk really slow.

Re:Speed (1)

kailoran (887304) | more than 7 years ago | (#14340431)

Too bad they don't write a thing about DEcompression speeds. I'd say it would in many cases be more important tha the compression speed.

Re:Speed (1)

eggnet (75425) | more than 7 years ago | (#14341009)

I'd image the decompression speeds are all fast.

Re:Speed (5, Insightful)

Luuvitonen (896774) | more than 7 years ago | (#14340772)

3 hours 47 minutes with WinRK versus gzipping in 3 minutes 16 seconds. Is it really worth watching the progress bar for 200 megs smaller file?

Re:Speed (2, Interesting)

Karma Farmer (595141) | more than 7 years ago | (#14340828)

3 hours 47 minutes with WinRK versus gzipping in 3 minutes 16 seconds. Is it really worth watching the progress bar for 200 megs smaller file?

If your file starts out as 250 mb, it might be worth it. However, if you start with a 2.5 gb file, then it's almost certainly not -- especially once you take the closed-source and undocumented nature of the compression algorithm into account.
 
/not surprisingly, the article is about 2.5 gb files

Re:Speed (2, Informative)

Wolfrider (856) | more than 7 years ago | (#14341187)

Yah, when I'm running backups and it has to Get Done in a reasonable amount of time with decent space savings, I use
gzip -9. (My fastest computer is 900MHz AMD Duron.)

For quick backups, gzip; or gzip -6.

For REALLY quick stuff, gzip -1.

When I want the most space saved, I (rarely) use bzip2 because rar, while useful for splitting files and retaining recovery metadata, is far too slow for my taste 99% of the time.

Really, disk space is so cheap these days that Getting the Backup Done is more important than saving (on average) a few megs here and there.

But if you Really Need that last few megs of free space, this is an OK guide to which compressor does that the best -- even if it takes *days.*

Why compress in the first place? (1, Interesting)

mosel-saar-ruwer (732341) | more than 7 years ago | (#14340453)

No talk of the speed of compression/decompression?

Speed aside [and speed would be a huge concern if you insisted on compression], I just don't understand the desire for compression in the first place.

As the administrator, your fundamental obligation is data integrity. If you compress, and if the compressed file store is damaged [especially if the header information on a compressed file - or files - is damaged], then you will tend to lose ALL of your data.

On the other hand, if your file store is ASCII/ANSI text, then even if file headers are damaged, you can still read the raw disk sectors and recover most of your data [might take a while, but at least it's theoretically do-able].

In this day and age, when magnetic storage is like $0.50 to $0.75 per GIGABYTE, I just can't fathom why a responsible admin would risk the possible data corruption that could come with compression.

Re:Why compress in the first place? (5, Insightful)

ArbitraryConstant (763964) | more than 7 years ago | (#14340508)

"I just don't understand the desire for compression in the first place."

Sometimes, people have to download things.

Re:Why compress in the first place? (0, Interesting)

Anonymous Coward | more than 7 years ago | (#14340833)

Bah, speak for yourself. In this day when everyone has a 100 Mbit connections (at least around pretty much in this country; hint, not USA), to be honest, compressed content is actually a hassle. For instance, when I'm downloading your latest movie on DVD-R, it's usually packed in RARs, saving a few 100 Mb if that at best. But who cares? When my download speed is pretty much limited by my harddrive, I'd rather spend the extra 10 seconds to get everyone incompressed instead of having to wait 10 minutes to unpack the damn thing.

Re:Why compress in the first place? (2, Insightful)

topham (32406) | more than 7 years ago | (#14340521)

I'd call you a troll, but I think you were being honest.

Compressing files with a good compression program does not increase the chance of it being corrupted.

And, the majority of files people send to each other, etc, aren't simply ascii files. (even if yours are).

The other advantage of using a compression program is the majority of them create archives and allow you to consolidate all the related files.

A good archive/compression program will add a couple of percent of reduntancy data which can substantially increase the data integrity. Above and beyond that which you have by simply story an ascii file uncompressed.

My concern with all the 'new' compression programs is that they, unlike Zip, haven't survived the test of time. I've recovered damaged zip archives in the past and they have come through mostly intact. I've used archive/compression like ARJ with options to be able to recover data even if there are multiple bad sectors on a harddrive or floppy disk. How many of the new compression programs have the tools available to adequately recover every possible byte of data?

Re:Why compress in the first place? (4, Interesting)

ArbitraryConstant (763964) | more than 7 years ago | (#14340619)

"My concern with all the 'new' compression programs is that they, unlike Zip, haven't survived the test of time. I've recovered damaged zip archives in the past and they have come through mostly intact. I've used archive/compression like ARJ with options to be able to recover data even if there are multiple bad sectors on a harddrive or floppy disk. How many of the new compression programs have the tools available to adequately recover every possible byte of data?"

The solution to this issue is popular on usenet, since it's common for large files to be damaged. There's a utility called par2 that allows recovery information to be sent, and it's extremely effective. It's format-neutral, but most large binaries are sent as multi-part RAR archives. par2 can handle just about any damage that occurs, up to and including missing files.

Most of the time however, when it's simply someone downloading something it is only necessary to detect damage so they can download it again. All the formats I have experience with can detect damage, and it's common for MD5 and SHA1 sums to be sent separately anyway for security reasons.

Re:Why compress in the first place? (0, Troll)

Master of Transhuman (597628) | more than 7 years ago | (#14341110)


Compressing files intended for BACKUP, as opposed to DOWNLOAD, DOES increase the chance of losing the entire file. That was the poster's point and it is entirely correct.

NEVER use compression on a backup unless you have PAR files you can use to recover the lost data if a bad sector on a CD, DVD, or bad block on a tape is discovered on restoration.

The Disk Archive (DAR) program is one of the few backup programs that can generate PAR files during the backup.

Re:Why compress in the first place? (1)

Jeff85 (710722) | more than 7 years ago | (#14340524)

Well what if you wish to transfer this data in a timely fashion? Sending less (read: compressed) data would take less time in this regard, though it's worth noting that the time required to decompress the data may make the total time to retrieve the original data longer than it would to just send the uncompressed data. So I think compression and decompression speed are also important factors.

Re:Why compress in the first place? (4, Interesting)

Ironsides (739422) | more than 7 years ago | (#14340525)

In this day and age, when magnetic storage is like $0.50 to $0.75 per GIGABYTE, I just can't fathom why a responsible admin would risk the possible data corruption that could come with compression.

Because when you are storing Petabytes of information it makes a difference in cost.

Besides, all the problems you mention with data coruption can be solved by backing up the information more than once. Anyplace that places a high value on there info is going to have multiple backups in multiple places anyways. The most usefull application of compression is in archiving old customer records. Being mostly text, you can easily get above 50% compression ratios. Also, these are going to be backed up to tape (not disk). Being able to reduce the volume of tapes being stored by 50% can save a lot of money for a large organization.

What about speed? (1)

www.sorehands.com (142825) | more than 7 years ago | (#14340867)

It is not only the space, but also the speed. Once the data is compressed, backing up the compressed data takes less time. If you compress, then backup you have to compare the compression time to the transfer time. Now, if you compress once, then backup, then copy the backup you now compare the compression time to 2X of the transfer time.

Outside of the pure speed issue, what media swapping? Once you exceed the media capacity (I'm talking removable media), the media needs to be swapped which not only takes time, but most like requires human interaction. If you have a 30GB tape, but you have a 40GB to backup, tape need to be swapped. This eliminates the "start the backup, go home" backup process.

Re:What about speed? (1)

Ironsides (739422) | more than 7 years ago | (#14341028)

On tape, this is not an issue. Serious tape libraries are automated. An arm manually loads in and extracts tapes used in backup. Mind you, I'm also assuming that any one really worrying about this is going to be "serious". LTO tapes (great for long term backup) hold 400GB (LTO3). Transfer speed is about 20MB/s (yes, megabytes). Tapes cost ~$100 each. Also, from my experience with compression, no compression algorithms (or computer hardware) can compress raw data fast enough to keep that rate up. It's going to stay a save money on tape/storage costs for the forseable future as far as I can tell.

Why compress in the first place? To save time. (1)

SineOtter (901025) | more than 7 years ago | (#14340533)

Not everyone cares about how great their data integrity is with compressed files- They just care about compressing a few files to send to someone over IM faster than if they were sending them uncompressed. When telling someone they have to wait 40 mintues for your file to finish sending because it's uncompressed, then speed/compression becomes the deciding factor.

Re:Why compress in the first place? (0)

Anonymous Coward | more than 7 years ago | (#14340656)

Plain text compresses beyond 10:1, making storage costs even cheaper (still 0.50 per GB, but that GB really stores 10GB of data, so more like 5 cents/GB in the end).

And yes, storage is cheaper than ever, but it's still somewhat expensive. I wish I could afford all the petabytes I want, but it still cost me ~500$CDN for 4 250GB drives (and bigger drives only cost more per GB). And no, it's not for pr0n, it's for music, movies (in H.264), ebooks, training videos, GBs of travel/family photos from my DSLR, etc. I wish I could afford to have mirrorring on some of my stuff, but that means even more HDs. Now, if I didn't use compression (rar/zip - not in the sense of mp3/mpeg4), I'd perhaps need twice that space. I'd need a 2nd job to buy HDs or something! Storage is STILL expensive!

Re:Why compress in the first place? (0)

Anonymous Coward | more than 7 years ago | (#14340685)

I always compress everything, even if I don't need the space savings, often just so that the compression checksum will provide assurance of the content's integrity (or alert me to subtle damage).

Also by compressing I can often span an archive across fewer CDs than would otherwise be needed, which reduces the risk of damage.

Then I make a redundant copy. Altogether, for roughly the same amount of storage media and just a tad extra effort, I get redundancy and integrity verification, instead of neither.

Re:Why compress in the first place? (3, Interesting)

LWATCDR (28044) | more than 7 years ago | (#14340817)

"As the administrator, your fundamental obligation is data integrity. If you compress, and if the compressed file store is damaged [especially if the header information on a compressed file - or files - is damaged], then you will tend to lose ALL of your data."
Not all data is stored in ASKII and or ANSI. Compressing the data can make it more secure not less.
1. It takes up less sectors of a drive so it is less likely to get corrupt.
2. Can contain extra data to recover from bad bits.
3. Allows you to make redundant copies without using any more storage space.
Let's say that you have some files that are in ASCII you want to store. Using any compression method you can probably store 3 copies of the file using the same amount of disk space.
You are far more likely to recover a full data set from three copies of compressed file than from one copy of an uncompressed file.

Also we do not have unlimited bandwidth and unlimted storage EVERYWHERE.Loseless video, image, and audio files take up a lot of space. For some applications MP3, Ogg, MPG, and JPEG just don't cut it.
So yes compression still is important.

Re:Why compress in the first place? (1)

Potor (658520) | more than 7 years ago | (#14341003)

well, if i want to ftp my nightly backup to a remote server, it's easier to combine these files into one file and then ftp that file - and what a better way than simply to compress a folder? it's either that, or ftp'ing each file independently.

compression can have more uses than simply saving space.

Re:Why compress in the first place? (3, Informative)

DeadboltX (751907) | more than 7 years ago | (#14341031)

Sounds like you need to introduce yourself to the world of par2 ( http://www.quickpar.org.uk/ [quickpar.org.uk] )

Parity reconstruction

Think of it like the year 2805 where scientists can regrow someones arm if they happen to lose it

Re:Why compress in the first place? (1)

ysegalov (849765) | more than 7 years ago | (#14341052)

You sound a bit like Bill Gates who said nobody will need more than 640Kb of RAM..

Also, data corruption has nothing to do with compression. Take an uncompressed EXE file, mess up a couple of bytes - and the whole package is useless.

Because it makes a hell of a lot of sense. (4, Insightful)

cbreaker (561297) | more than 7 years ago | (#14341069)

If you're familiar with Usenet, you've probably encountered PAR files from time to time. A PAR file is a parity file which can be used to reconstruct lost data. It works sort of like a RAID, but with files as the units instead of disks.

Let's say you have a 200MB file to send. You could just send the 200MB file, with no guarantees that it will reach the destination uncorrupted. Or, you could use a compression program and bring it down to 100MB. In this case, even if you lost the first transfer, you could transfer it a second time. Then we look at PAR. You compress the 200MB file into ten 10MB files. Then, you could include 10% parity - if any of your files is bad, you'd be able to reconstruct it with the parity file. With only 110MB of transfer. PAR2 goes even further by breaking down each file into smaller units.

Besides transfer times and correction for network transfers, compression can also increase speeds of transfer to mediums. If you have an LTO tape drive that can only write to tape at 20MB/sec, you'll only ever get 20MB/sec. Add compression to the drive, and you could theoretically get 40MB/sec to tape with 2:1 compression. That means faster backups, and faster restores. On-board compression in the drives takes all the load off the CPU - but even if you use the CPU for it, they're fast enough to handle it.

Not to mention, it takes a lot less tape to make compressed backups. I don't know what world you live in, but in mine, I don't have unlimited slots in the library and I don't want to swap tapes twice a day. Handling tapes is detremental to their lives; you really want to touch them as least as possible.

Data corruption isn't caused by compression. If it's going to happen, it'll happen regardless. While your point is true that it MAY be more difficult to recover from a corrupt file, that's not the right methodology. If your backups are that valuable, you'd make multiple copies - plain and simple.

I can't fathom why a responsible and well informed admin would avoid compression.

Re:Because it makes a hell of a lot of sense. (0, Troll)

Master of Transhuman (597628) | more than 7 years ago | (#14341174)

"While your point is true that it MAY be more difficult to recover from a corrupt file, that's not the right methodology. If your backups are that valuable, you'd make multiple copies - plain and simple."

Two problems with your response:

1) If your data is that valuable, compressing makes it more likely to lose it.

2) If your data is that valuable, making two copies takes twice the time and space - even with compression - and if you use compression and get a bad sector, fifty percent of your backup is now useless. Sure, the odds are good that you can recover from the second backup - but if IT has a bad sector - even in a different place - possibly because your device is going bad - then you've lost the second backup as well.

If you backup more than once UNCOMPRESSED, you can recover almost anything because it is VERY unlikely that a bad sector will occur in the exact same spot or even in the same file (assuming the one file does not take up most of the specific media.)

If your data is valuable, back up twice uncompressed. If your data is only so-so valuable, back up twice compressed. If your data is easily replaced, back up once uncompressed. NEVER back up once compressed - you might as well not back up at all then.

Alternatively, use PAR files to recover - as long as you're willing to add the extra space and time - which sort of obviates the advantage of compression, doesn't it?

And if the only valid argument for compression is saving the cost of media, then obviously your data is less valuable than you think it is - in which case why bother backing it up at all (other than legal requirements)? The cost of media simply is not a factor in comparison to the cost of the time required to back it up, the cost of the time to restore if needed, and the value of the data itself. That is being "penny-wise and pound-foolish" - a typical attitude among geeks who are obsessed with efficiency over effectiveness. Save a few gigabytes of space and lose the data - yeah, that's real smart...

If you want to back up quickly and securely, have two devices backing up simultaneously uncompressed - or two devices backing up simultaneously compressed with PARs. You can't lose - it's that simple.

Agreed! (1)

p3d0 (42270) | more than 7 years ago | (#14341109)

Why use JPEG or PNG when you can just use .BMP files?

Re:Speed (3, Insightful)

Anonymous Coward | more than 7 years ago | (#14340550)

No talk of the speed of compression/decompression?

Exactly! We compress -terabytes- here at wr0k, and we use gzip for -nearly- everything (some of the older scripts use "compress", .Z, etc.)

Why? 'cause it's fast. 20% of space just isn't worth the time needed to compress/uncompress the data. I tried to be modern (and cool) by using bzip2, yes, it's great, saves lots of space, etc., but the time required to compress/uncompress is just not worth it. ie: if you need to compress/decompress 15-20gigs per day, bzip2 just isn't there yet.

Also, look at what google is using---they probably store more data than most other corps, but they still use gzip (I think, from some description, somewhere).

Re:Speed (3, Insightful)

Arainach (906420) | more than 7 years ago | (#14340610)

The Article Summary quoted is completely misleading. The most important graph is the final one on page 12, Compression Efficiency, where gzip is once again the obvious king. Sure, WinRK may be able to compress decently, but it takes an eternity to do so and is impractical for every-day use, which is where routines like gzip and ARJ32 come in - incredible compression for the speed in which they can operate. Besides - who really needs that last 54MB in these days of 4.9GB DVDs and 160GB Hard Drives?

More time = More compression (4, Insightful)

bigtallmofo (695287) | more than 7 years ago | (#14340369)

For the most part, the summary of the article seems to be the more time that a compressing application takes to compress your files, the smaller your files will be after compressing.

The one surprising thing I found in the article was that two virtually unknown contenders - WinRK and Squeez did so well. One disappointing obvious follow-up question would be how more well-known applications such as WinZip or WinRAR (which have a more mass-appeal audience) stack up against them with their configurable higher-compression options.

Re:More time = More compression (3, Funny)

Orgasmatron (8103) | more than 7 years ago | (#14340581)

Speaking of unknown compression programs, does anyone remember OWS [faqs.org] ?

I had a good laugh at that one when I figured out how it worked, way back in the BBS days.

Re:More time = More compression (1, Informative)

Anonymous Coward | more than 7 years ago | (#14340681)

looks like you can still grab a copy of it here [www.sac.sk] .

Re:More time = More compression (2, Interesting)

undeadly (941339) | more than 7 years ago | (#14340595)

For the most part, the summary of the article seems to be the more time that a compressing application takes to compress your files, the smaller your files will be after compressing.

Not only time, but also how much memory the algorithm uses, though the author did not mention how much space each algorithm uses. gzip, for instance, does not use much, but others, like rzip (http://rzip.samba.org/ [samba.org] ) uses alot. rzip may use up to 900MB during compression.

I did a test with compressing a 4GB tar archive with rzip, wich result in a compressed file of 2.1 GB. gzip at max compression gave about 2.7 GB.

So one should choose an algorithm based upon need, and of course, availability of source code. Using a propetiary, closed source compression algorithm with no open source alternative implementation is begging for trouble down the road,

Re:More time = More compression (1)

jZnat (793348) | more than 7 years ago | (#14340932)

As far I as I understand, memory usage becomes an issue usually with block-sorting algorithms. The more data you analyze at a time, the larger memory usage you will have.

Re:More time = More compression (5, Interesting)

Rich0 (548339) | more than 7 years ago | (#14340630)

If you look at the methodology - all the results were obtained using the software set to the fastest mode - not the best compression mode.

So, I would consider gzip the best performer by this criteria. After all, if I cared most about space savings I'd have picked the best-mode - not the fast-mode. All this articles suggests is that a few archivers are REALLY lousy for doing FAST compression.

If my requirements were realtime compression (maybe for streaming multimedia) then I wouldn't be bothered with some mega-compression algorithm that takes 2 minutes per MB to pack the data.

Might I suggest a better test? If interested in best compression, then run each program in a mode which optimizes purely for compression ratio. On the other hand, if interested in realtime compression then take each algorithm and tweak the parameters so that they all run in the same time (which is a realtively fast time), and then compare compression ratios.

With the huge compression of multimedia files I'd also want the reviewers to state explicity that the compression was verified to be lossless. I've never heard of some of these proprietary apps, but if they're getting significant ratios out of .wav and .mp3 files I'd want to do a binary compare of the restored files to ensure they weren't just run through a lossy codec...

Re:More time = More compression (0)

Anonymous Coward | more than 7 years ago | (#14340862)

I've never heard of some of these proprietary apps, but if they're getting significant ratios out of .wav and .mp3 files I'd want to do a binary compare of the restored files to ensure they weren't just run through a lossy codec...

That should be done for all contenders and all files, or dropped as a consideration alltogether.

Compressia (1)

ardor (673957) | more than 7 years ago | (#14340370)

I always wanted to know how Compressia ( http://www.compressia.com/ [compressia.com] ) works. It uses some form of distance coding, but information about it is quite rare.

WinRK is excellent (4, Interesting)

drsmack1 (698392) | more than 7 years ago | (#14340376)

Just downloaded it and I find that it compresses significantly better than winrar when both are set to maximum. Decompress is quite slow. I use it to compress a small collection of utilities.

Nice Comparison... (4, Insightful)

Goo.cc (687626) | more than 7 years ago | (#14340398)

but I was surprised to see that the reviewer was using XP Professional Service Pack 1. I actually had to double check the review date to make sure that I wasn't reading an old article.

I personally use 7-Zip. It doesn't perform the best but it is free software and it includes a command line component that it nice for shell scripts.

Re:Nice Comparison... (1)

lowid (24) _________ (878977) | more than 7 years ago | (#14340884)

Article is slow, so I can't speak to it specifically, but I personally still use XPSP1 for audio work because the sp2 firewall creates a lot of instability. (This was my opinion, and I later discovered it to be a general consensus in the audio community.) For people who need their windows box to be as stable as possible, it's probably best to stick with sp1 for a while.

Re:Nice Comparison... (1)

Johnno74 (252399) | more than 7 years ago | (#14340985)

Why not install SP2 and use a different firewall then? I hadn't heard about any SP2 firewall problems before, but I don't use it (MS's firewall) anyway - I use Kerio 2.1.4 (the last good ver before they became bloated)

My box has always been fairly stable, but even more so under SP2.

Re:Nice Comparison... (1)

fbjon (692006) | more than 7 years ago | (#14341059)

What kind of instability, specifically? I haven't noticed anything.

Re:Nice Comparison... (1)

IamTheRealMike (537420) | more than 7 years ago | (#14341134)

On UNIX systems at least the LZMA codec is excellent - it regularly achieves better ratios than bzip2, and is very fast to decompress. For many applications, decompression speed is more important than compression speed and the LZMA dictionary appears to fit inside the CPU cache, as it beats out bzip2 handily even though it's doing more work.

There are better compressors out there, in particular PPM codecs can achieve spectacular ratios, but as they're very slow to both compress and decompress they're useful mostly for archiving.

I've also seen great results from codecs tuned specifically to certain types of data over others, for instance, a PPM codec designed specifically for Intel x86 executable code can work wonders.

horrible site interface (0, Redundant)

the_humeister (922869) | more than 7 years ago | (#14340400)

Is it just me or is that site really difficult to navigate amongst all those ads? Speed of compression would have been nice too.

Re:horrible site interface (0, Redundant)

the_humeister (922869) | more than 7 years ago | (#14340422)

Looks like I posted too fast. There's a speed comparison somewhere around there...

Re:horrible site interface (1)

reset_button (903303) | more than 7 years ago | (#14340573)

It's actually quite easy - just keep incrementing the "pgno" variable in the URL :)

Windows only (2, Interesting)

Jay Maynard (54798) | more than 7 years ago | (#14340407)

It's a real shame that 1) the guy only did Windows archivers, and 2) SBC Archiver is no longer in active development, closed source, and Windows-only.

Actually (5, Interesting)

Sterling Christensen (694675) | more than 7 years ago | (#14340413)

WinRK may have won only because he used the fast compression setting on all the compressors he tested. Results for default setting and best compression settings are TBA.

This is a surprisingly big subject (4, Informative)

derek_farn (689539) | more than 7 years ago | (#14340420)

There are some amazing compression programs out there, trouble is they tend to take a while and consume lots of memory. PAQ [fit.edu] gives some impressive results, but the latest benchmark figures [maximumcompression.com] are regularly improving. Let's not forget that compression is not good unless it is integrated into a usable tool. 7-zip [7-zip.org] seems to be the new archiver on the block at the moment. A closely related, but different, set of tools are the archivers [freebsd.org] , of which there are lots with many older formats still not supported by open source tools

Open formats and long-term accessibility (5, Insightful)

ahziem (661857) | more than 7 years ago | (#14340432)

A key benefit to PKZIP and tarballs formats is that they will be accessible for decades or hundreds of years. These formats are open (non-proprietary), widely implemented, and free (as in freedom) software.

The same can't be said for WinRK. Therefore, if you plan to want access to your data for a long period of time, you should carefully consider whether the format will be accessible.

Unix compressors (5, Interesting)

brejc8 (223089) | more than 7 years ago | (#14340434)

I did a short review and benchmarking of unix compressors [brej.org] people might be interested in.

rzip? (0)

Anonymous Coward | more than 7 years ago | (#14340531)

how does it perform against the rest?

http://rzip.samba.org/ [samba.org]

Re:rzip? (1)

brejc8 (223089) | more than 7 years ago | (#14340701)

rzip and szip are the two compressors which I didnt know about before I started doing the review. They are both about a couple precent better than bzip2 and I will include them if I do an update.

Re:Unix compressors (1)

Queuetue (156269) | more than 7 years ago | (#14340565)

Thanks for this - you helped me take the plunge and updating my remote backup scripts ... They now take about 1/10th the time to transfer and space to store, all by changing gzip to 7z in 4 or 5 places!

Re:Unix compressors (1)

TypoNAM (695420) | more than 7 years ago | (#14340770)

What is up with this? "Also do note the lack of ace results as there are no Unix ace compressors." on your Compression times page, yet somehow you were able to do it on your Size page. Kind of funny isn't it?

Re:Unix compressors (1)

brejc8 (223089) | more than 7 years ago | (#14340856)

I used winace to do the compression. The idea is to determine which format to distribute the files in and ace is still possible due to the linux decompressor.

Re:Unix compressors (1)

Justin205 (662116) | more than 7 years ago | (#14340860)

No, not funny at all. Perfectly sensible.

Size doesn't depend on the OS it was compressed on (generally - perhaps a small bit, at most). So he compressed it for size on Windows (or an OS with an ACE compressor).

Speed, however, does depend on the OS it was compressed on. Much more than size, at any rate. So the results would have been skewed in one direction or the other, due to the OS.

Re:Unix compressors (1)

TypoNAM (695420) | more than 7 years ago | (#14340981)

So, it is perfectly sensible to include a non-UNIX compression utility in a UNIX compression utilities review? WinAce does have a decompressor for UNIX, but no compressor, therefore shouldn't it have been dropped completely for this review because of that? Because it is irrelevant to this review if there is no UNIX compressor for it and this is a UNIX compression review.

Sorry for nit picking, but come on how can you go use WinACE on Windows to do the size compressions and then use all the other compressors on UNIX, now that really skews the results. Now this review doesn't seem so trust worthy now doesn't it due to complete lack of details as to what environments he really did use. Seems just like those half-ass'd hardware reviews where they provide you no real information and just to show off what they want you to see, and not the whole picture.

Re:Unix compressors (0)

Anonymous Coward | more than 7 years ago | (#14340773)

Which options did you use in your tests for compressors that let you to choose between "fast" and "maximum" compression? (such as the -1 to -9 flags in gzip)

Re:Unix compressors (1)

brejc8 (223089) | more than 7 years ago | (#14340851)

For these I chose the highest compression for each test. I chose to do that because most of the compressors assume the max compression option (e.g. bzip2 assumes -9) and I was more intrested in the size reather than the speed.

Re:Unix compressors (1)

molo (94384) | more than 7 years ago | (#14340929)

Thanks for that benchmark. It might be interesting to see a plot of size vs. compression time.

-molo

Re:Unix compressors (1)

GigsVT (208848) | more than 7 years ago | (#14340998)

About stuffit linux... Unless something's changed, they talk a lot about it being a time limited trial version, but it never expires. At least the copies I used a few years ago didn't.

Quite interesting (1)

shayera (518168) | more than 7 years ago | (#14340436)

I'd like to see an article about exe compressors done like this.
There are some interesting beasts out there like UPX, which as far as I remember does quite respectable packing on the win32 platform.

the WinRK archive compressor tested here seems to achieve quite amazing results on the cost of speed.. a lot of speed..

Re:Quite interesting (1)

_Shorty-dammit (555739) | more than 7 years ago | (#14340586)

I never understood the point of exe compressors, once HDs made it past the megabyte stage, well, there wasn't much point. And it's worse for distribution, since your archiving program will compress it better anyways if you hadn't UPXed it. Whenever I get something that's been UPXed, the first thing I do is decompress it.

Re:Quite interesting (1)

StillAnonymous (595680) | more than 7 years ago | (#14341049)

I find that exe compressors are generally used more for their ability to be exe encrypters.

So long as you don't use some known and easily decompressable packer (like UPX), it adds a layer of protection to the program that prevents people from just hex editing the contents and patching out protection routines. They have to go through the trouble of decompressing the file first. That or write a loader that performs the patch in memory after the program has unpacked.

Just use DiskDoubler (5, Funny)

mattkime (8466) | more than 7 years ago | (#14340446)

Why mess around with compressing individual files? DiskDoubler is definitely the way to go. Hell, I even have it set up to automagically compress files I haven't used in a week.

Its running perfectly fine on my Mac IIci.

Re:Just use DiskDoubler (3, Funny)

SleepyHappyDoc (813919) | more than 7 years ago | (#14340486)

Mac IIci? Has it finished compressing files since you bought it?

...or NTFS (1)

tepples (727027) | more than 7 years ago | (#14340499)

Why mess around with compressing individual files? DiskDoubler is definitely the way to go.

And NTFS of Windows 2000 or later includes technology similar to DiskDoubler.

Re:Just use DiskDoubler (2, Insightful)

fbjon (692006) | more than 7 years ago | (#14341180)

I prefer DoubleSpace [wikipedia.org] for maximum file-destroying activity.

Input type? (3, Interesting)

reset_button (903303) | more than 7 years ago | (#14340458)

Looks like the site got slashdotted while I was in the middle of reading it. What file types were used as input? Clearly compression algorithms differ on the file types that they work best on. Also, a better metric would probably have been space/time, rather than just using time. Also, I know that zlib, for example, allows you to choose the compression level - was this explored at all?

Also, do any of you know any lossless algorithms for media (movies, images, music, etc)? Most algorithms perform poorly in this area, but I thought that perhaps there were some specifically designed for this.

Re:Input type? (1)

reset_button (903303) | more than 7 years ago | (#14340603)

Looks like the load on the site just went down and I was able to read the remainder of the article. Looks like they do use different input types, as well as a space vs. time metric. I'm not crazy about using a stopwatch, but that's probably the best you can do if you're working with a GUI. If any of you know the answer to my question at the end though, it would be appreciated.

Re:Input type? (1)

!equal (938339) | more than 7 years ago | (#14340646)

Also, do any of you know any lossless algorithms for media (movies, images, music, etc)? Most algorithms perform poorly in this area, but I thought that perhaps there were some specifically designed for this.

I know of one for music called FLAC [sourceforge.net] (Free Lossless Audio Codec).

Re:Input type? (2, Interesting)

bigbigbison (104532) | more than 7 years ago | (#14340844)

According to Maximum Compression [maximumcompression.com] , which is basically the best site for compression testing, Stuffit's [stuffit.com] new version is the best for lossless jpeg compression. I've got it and I can confirm that it does a much better job on jpegs than anything else I've tried. Unfortunately, it is only effective on jpegs not gifs, pngs, or even pdfs which seem to use jpeg compression. And, outside of the mac world, it is kind of rare.

Blank page (0)

Anonymous Coward | more than 7 years ago | (#14340461)

Anyone else having trouble viewing the site? It comes up utterly blank in IE with all patches on fully updated XP. View sources shows everything you'd expect to see but it's rendering blank. ?? (useless "don't use IE" type comments will be modded flamebait)

Re:Blank page (0)

Anonymous Coward | more than 7 years ago | (#14340991)

1) The site is Slashdotted.
2) Don't use IE.

Why compress in weird formats? (4, Insightful)

canuck57 (662392) | more than 7 years ago | (#14340470)

I generally prefer gzip/7-Zip.

The reasoning is simple, I can use the results cross platform without special costly software. A few extra bytes of space is secondary.

For many files, I also find buying a larger disk a cheaper option than spending hours compressing/uncompressing files. So I generally only compress files I don't think I will need that are very compressable.

Re:Why compress in weird formats? (2, Insightful)

_Shorty-dammit (555739) | more than 7 years ago | (#14340720)

haha, yeah, 7-zip isn't 'weird' at all. I like how you try to make it sound like it's just as pervasive as something like gzip, even though 7-zip's a pretty much unknown format.

Re:Why compress in weird formats? (1)

jp10558 (748604) | more than 7 years ago | (#14340958)

Yeah, when compressing files, I'm basically limited to .zip for most people, cause WinXP will handle that. For the savvy, I might get to use .rar for a little better compression.

Has anyone heard of WinUHA yet? That is supposed to be pretty good, and I'd not mind testing out other archivers, as long as the time savings on transferring smaller files aren't overtaken by the compression/decompression time. Though, again, all these things are useless if no one can uncompress them.

'more of the same' - delta compression (1)

erwincoumans (865613) | more than 7 years ago | (#14340572)

>backing up last years data to make room for more of the same. If it's really more of the same, using delta compression on new data using last-years data would work nicely.

common compression utilities benchmarks (0, Redundant)

qazwsx789 (647638) | more than 7 years ago | (#14340777)

I did a small test of the common linux compression commands back in 2000. Here are the results: (note that some of the command options have changed since then, for example tar now uses -j for bzip2)

THE COMPRESSION UTILITY TEST

Compression utilities tested: zip, rar, gzip, bzip2, tgz(tar with the z flag invoked). Each test was run three times. For each completed test the system was rebooted. Hardware used: Pentium2 350Mhz, 256Mb RAM. OS: linux Mandrake 7.1. The system load was minimal. The "time" commands was used to time the elapsed time, the "ls -l" command was used to determin the size and a script was used to determine the total size of gzip files.

Note: gzip, packs individual files recursively. For bzip2, the command invoked was tar -cvIf file.bz2 dir (in gnu tar, the I flag invokes bzip2). for tgz, tar with the z flag invokes gzip.

TEST 1 - compressing multiple files

total size of the dir: 91.621.857 bytes, total files: 3540 (most of these files are ascii and html, but there are a few gifs and jpgs too.)

default compression settings:

tool     time elapsed  MB/s   compressed to    time elapsed uncompressing
gzip     1m.44s        0.88   24.884.124       37s
zip      1m.10s        1.3    25.813.958       41s
rar      3m.25s        0.44   20.784.489       48s
bzip2    3m.54s        0.39   17.399.561       1m.17s
tgz      1m.09s        1.32   23.821.446       36s

maximum compression settings:

tool     time elapsed  MB/s   compressed to    time elapsed uncompressing
gzip     2m.00s        0.76   24.670.516       36s
zip      1m.42s        0.89   25.593.448       39s
rar      10m.12s       0.14   18.698.710       1m.02s
bzip2    n/a (the comprsession rate can not be specified through tar, is the maximum default?)
tgz      n/a (the compression rate can not be specified through tar, is the maximum default?)

CONCLUSION: use tgz (tar with the z flag) if time is an issue, otherwise use bzip2(tar with the I flag)

TEST 2 - compressing 1 ascii file

size of the ascii file: 53.819.786 bytes (the file was taken out of my mailbox)

default compression settings:

tool     time elapsed  MB/s   compressed to    time elapsed uncompressing
gzip     42s           1.28   15.560.144       15s
zip      41s           1.31   15.560.261       17s
rar      1m.57s        0.45   11.507.387       17s
bzip2    1m.58s        0.45   10.788.502       39s
tgz      54s           0.99   15.560.907       8s

maximum compression settings:

tool     time elapsed  MB/s   compressed to    time elapsed uncompressing
gzip     44s           1.22   15.486.842       15s
zip      45s           1.19   15.486.959       16s
rar      6m.40s        0.08   09.582.810       17s
bzip2    1m.53s        0.47   10.788.502       39s
tgz      n/a (the compression rate can not be specified through tar, is the maximum default?)

CONCLUSION: use zip or gzip if time is an issue, otherwise use bzip2(tar with the I flag)

BTW: for useful webmaster and sysadmin linux scripts visit: http://comp.eonworks.com/scripts/scripts.html

small mistake (4, Interesting)

ltwally (313043) | more than 7 years ago | (#14340827)

There is a small mistake on page 3 [rojakpot.com] of the article, in the first table: WinZip no longer offers free upgrades. If you have a serial for an older version (1-9), that serial will only work on the older versions. You need a new serial for v10.0, and that serial will not work when v11.0 comes out.

Since WinZip does not handle .7z, .ace or .rar files, it has lost much of its appeal for me. With my old serial no longer working, I now have absolutely no reason to use it. Now when I need a compressor for Windows I choose WinAce & 7-Zip. Between those two programs, I can de-/compress just about any format you're likely to encounter online.

Compress to 0K (2, Funny)

Anonymous Coward | more than 7 years ago | (#14340875)

I always compress my compressed files over and over until I achieve absolute 0Kb.
I carry all data of my entire serverfarm like that on a 128Mb USB-stick.

Nothing to see here (5, Informative)

Anonymous Coward | more than 7 years ago | (#14340892)

I can't believe TFA made /. The only thing more defective than the benchmark data set (Hint: who cares how much a generic compressor can save on JPEGs?) is the absolutely hilarious part where the author just took "fastest" for each compressor and then tried to compare the compression. Indeed, StuffIt did what I consider the only sensible thing for "fastest" in an archiver, which is to just not even try to compress content that is unlikely to get significant savings. Oddly, the list for fastest compression is almost exactly the reverse of the list for best compression on every test. The "efficiency" is a metric that illuminates nothing. An ROC plot of rate vs compression for each test would have been a good idea; better would be to build ROC curves for each compressor, but I don't see that happening anytime soon.

I wouldn't try to draw any conclusions from this "study". Given the methodology, I wouldn't wait with bated breath for parts two and three of the study, where the author actually promises to try to set up the compressors for reasonable compression, either.

Ouch.

Re:Nothing to see here (1)

igrigorik (818839) | more than 7 years ago | (#14341145)

I agree with you but I also think that the whole pursuit for the 'best' compressor is misguided, even a set of ROC curves won't tell us much. From a practical standpoint a single compressor as 'jack of all trades' is obviously the best solution but due to the differences in the compression algorithms every data-set that you're going to push through the compressor will yield different results. If you even take the most basic/well studied Lempel-Ziv and Huffman algorithms you'll quickly find cases where each would be preferred over another.

From a programmers point of view: - Sometimes I don't want to send my dictionary with my encoded file, sometimes I can even assume that we have the dictionaries on both end points of communication. - Sometimes I can wait 5 minutes to zip a file and 20 minutes to unzip it. When I'm trying to stream a file, I probably don't. - Sometimes I want everyone to be able to read my file (zip it!). Sometimes I don't.

And since different algm's identify different patterns in the file their compressing, certain files will be compressed better by different algorithms and do much worse on the next file. Besides, we're not even getting into any discussion of lossy/lossless algm's here. (Think jpeg vs bmp).

Maximum Compression has efficiency comparisons (5, Informative)

bigbigbison (104532) | more than 7 years ago | (#14340900)

Since the original site seems to be really slow and split into a billion pages, those who aren't aware of it might want to look at MaximumCompression [maximumcompression.com] since it has tests for several file formats and also has a multiple file compression test that is sorted by efficiency [maximumcompression.com] . A program called SBC [netfirms.com] does the best, but the much more common WinRAR [rarlab.com] comes in a respectable third.

Related Links Broken (2, Funny)

Karma Farmer (595141) | more than 7 years ago | (#14340918)

The "related links" box for this story is horribly broken. Instead of being links related to the story, it's a bunch of advertising. I'm sure this was a mistake or a bug in slashcode itself.

I've searched the FAQ, but I can't figure out how to contact slashdot admins. Does anyone know an email address or telephone number I can use to contact them about this serious problem? I'm sure they'll want to fix it as quickly as possible.

No one ever looks at rzip (3, Interesting)

Mr.Ned (79679) | more than 7 years ago | (#14340936)

http://rzip.samba.org/ [samba.org] is a phenomenal compressor. It does much better than bzip2 or rar on large files and is open source.

Okay, I'll look at rzip then... (0)

Anonymous Coward | more than 7 years ago | (#14341070)

Those are some pretty impressive compression ratios, but how does rzip do speed-wise? Is it faster, slower, or about the same as bzip2?

Regardless of how fast it is, it looks like it's worth considering if you have large files to compress. Thanks for pointing it out--I'll give it a try next time I make backups.

Decompression Speed (3, Interesting)

Hamfist (311248) | more than 7 years ago | (#14341034)

Interesting that the article talks about compression ratio and compression speed. When considering compression, Decompression time is extremely relevant. I don't mind witing more to compress the fileset, as long as decompression is fast. I normally compress once, and then decompress various times (media files and games for example).

YOU FaIL IT... (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#14341038)

Tutorial with rzip, graphs and bandwidth (0)

Anonymous Coward | more than 7 years ago | (#14341042)

rzip wasn't reviewed but it uses hashing to quickly look for previously seen data. I think it's great. A tutorial with it and other linux compression tools is here [linuxjournal.com] . The tutorial also has graphs that make it easy to see the trade offs between speed and compression ratio, as well as advice on which compressors increase effective bandwidth the most for your CPU and network speed.

Unicode support? (3, Informative)

icydog (923695) | more than 7 years ago | (#14341105)

Is there any mention made about unicode support? I know that WinZip is out of the question for me because I can't compress anything with Chinese filenames with it. They'll either not work at all, or become compressed but the filenames will turn into garbage. Even though the data stays intact, it doesn't help much if it's a binary and has no intelligible filename.

I've been using 7-Zip for this reason, and also because it compresses well while also working on Windows and Linux.

accuracy test missing (2, Insightful)

Grimwiz (28623) | more than 7 years ago | (#14341120)

A suitable level of paranoia would suggest that it would be good to decompress the compressed files and verify that they produce the identical dataset. I did not see this step in the overview.

There's an article in there somewhere? (4, Insightful)

cbreaker (561297) | more than 7 years ago | (#14341141)

All I see is ads. I think I found a paragraph that looked like it may have been the article, but every other word was underlined with an ad-link so I didn't think that was it..

JPG compression (5, Interesting)

The Famous Druid (89404) | more than 7 years ago | (#14341160)

It's interesting to note that Stuffit produces worthwhile compression of JPG images, something long thought to be impossible.
I'd heard the makers of Stuffit were claiming this, but I was sceptical, it's good to see independant confirmation.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?