GZipping Life Forms: Deflate Reveals Bare-Bones

Hemos posted about 11 years ago | from the getting-to-the-core-of-the-matter dept.

Science 245

An anonymous reader writes "To distinguish images derived from living vs. non-living sources, USC and NASA JPL researchers report today using the standard gzip compression utility. As a measure of overall pattern complexity, they find that the inherent pixel content of biologically generated fossils produces higher image compression ratios [more data redundancy], compared to their non-biological counterparts. The more the file shrinks, the more likely it is that a living process was involved. A test is live online here. This extends the simple, but powerful, uses of gzip to biogenic fossil detectors, in addition to spam cop filters, DNA sequence comparisons, digital camera image crunchers, etc. In nine months, the two Mars rovers will send back the first microscopic-scale images of Mars rocks, which should be amenable to some of these same techniques: thus gzipping is apparently pretty zippy."

Makes sense... (4, Insightful)

Anonymous Coward | about 11 years ago | (#5631029)

Lifeforms seem to be built on patterns afterall. Patterns are easily compressible.

Re:Makes sense... (5, Interesting)

jolyonr (560227) | about 11 years ago | (#5631306)

Unfortunately it's not that simple, inorganic systems can have as much visual complexity as organic things. For example.. um.. (looks out of window here in Toronto).. a snowflake! Fractal complexity, such as that seen in the branches of a tree, is frequently mirrored in the inorganic world - the snowflake is one example, another less well known example are manganese dendrites, they look just like fossil plants, but are totally inorganic such as these [vic.gov.au] [Victoria Museum]. The patterns of frost on a frozen windscreen are another example. I can't see how a computer program can distinguish whether such complex patterns are signs of life or not. Still, if it helps NASA get more funding, then who am I to argue! Jolyon

Cool (1)

fudgefactor7 (581449) | about 11 years ago | (#5631035)

Bad pun at the end of the original post not withstanding, this is pretty cool stuff. Wonder why nobody thought of using comression in this manner before? This has all sorts of potential uses.

I compress.. (4, Funny)

mr. methane (593577) | about 11 years ago | (#5631038)

... therefore I am.

I'm not sure I should be flattered that the best way to tell a picture of me from a picture of a rock is that I have more redundant image data. :-)

Re:I compress.. (5, Funny)

DShard (159067) | about 11 years ago | (#5631101)

That actually should flatter you. You have less entropy so you are of a higher order than the rock. You can brag to all your non-rock friends that those stupid rocks have high entropy.

why no bzip2 ? (-1)

Anonymous Coward | about 11 years ago | (#5631043)

doesnt bzip2 outperforms gzip ?

Re:why no bzip2 ? (5, Interesting)

bill_mcgonigle (4333) | about 11 years ago | (#5631242)

doesnt bzip2 outperforms gzip ?

gzip might be preferable because it works more locally. It only keeps track of the last n bytes of data and does substitutions based on patterns seen in those n bytes.

bzip2 uses a markov predictor and the chain length is typically much longer than gzip uses, so the compression is less local. That's great if you're going for compression but for this work, it might be misleading.

That said, gzip doesn't know about image formats, so I wonder if these guys are getting some false positives on scanline wraps and other non-image data.

A-ha! (4, Funny)

grub (11606) | about 11 years ago | (#5631045)

So when we compress the ultimate, super-duper intelligent life form we get a two byte file containing "42"

Re:A-ha! (1)

spot35 (644375) | about 11 years ago | (#5631130)

Ironically, when compressing a file containing just '42' (using winzip, sorry) the resultant file is 112 bytes compared to the 'uncompressed' 2 bytes

Re:A-ha! (1)

cymen (8178) | about 11 years ago | (#5631257)

With gzip 1.2.4 the file containing "42" (3 bytes) results in a compressed size of 25 bytes.

So did you make a pkzip compatible file or a gzip file? I tried Winzip 8.1 SR-1 (5266) here on a Win2k system and the file with "42" (4 bytes, made with edit.com) became a 104 byte zip file.

Re:A-ha! (1)

spot35 (644375) | about 11 years ago | (#5631291)

Don't know. I just thought that trying to compress the ultimate answer to life the universe and everything causing the size of the file to become larger than the original was quite ironic. ;)

Re:A-ha! (1)

C A S S I E L (16009) | about 11 years ago | (#5631220)

Well, indeed; in terms of applying image compression, the highest form of life is indeed the super-intelligent shade of the colour blue, just as Adams predicted.

Re:A-ha! (1)

fredrikj (629833) | about 11 years ago | (#5631308)

Excellent! Now that we've made that discovery, all we need to figure out is how to decompress it.

Hmm, thoughts anyone?

I'd assume (2, Interesting)

Omkar (618823) | about 11 years ago | (#5631049)

that this has something to due with patterns and image continuity. If so (enlighten me!), then it would be a decent filtering tool, but reliability would be a major problem. Geological (or whatever) patterns could fool the algorithm. Finally, the most compressible image consists of monochrome - is it alive?

(Mods: the last line was a joke, intended to point out a particularly simple example of a problem - not a troll)

Re:I'd assume (0)

Anonymous Coward | about 11 years ago | (#5631172)

The question isn't 'is it alive' but has it been created/influenced by life. I would submit that any monochrome image would have to be created by a life form, either by selection or creation.

For instance, a black piece of plastic. Created by humans. It all comes down to entropy. Life is one of the few forces that can cause a localized decrease in entropy, and therefore a non-random pattern. Non-random patterns compress. Random ones do not. (or not well)

horsefeathers. (1, Interesting)

Anonymous Coward | about 11 years ago | (#5631050)

It is true that many pictures of life forms compress to better or worse than than in-antimate objects. Just beause a picture of something compresses similarly to a life form doesn't mean it is a life form. This is simply coincidence.

Re:horsefeathers. (1)

Lussarn (105276) | about 11 years ago | (#5631309)

What the article is trying to say is that intelligent lifeforms likes patterns. Look at a car. It's one color, and should compress fairly well. Look at a tree. It's many colors and shouldn't compress that well.

I don't think a picture of a human compresses better than a tree though.

Re:horsefeathers. (1)

bsharitt (580506) | about 11 years ago | (#5631332)

Well compressing things is at least a start. Perhaps with another technique or two a coputer could be able to almost be certain whether or not something is alive.

So, when's the transporter going to work? (0)

Anonymous Coward | about 11 years ago | (#5631057)

Just zip me up and email me to Scotty. Too bad I'll probably have to do it naked to save a few bytes.

Excellent... (5, Funny)

Anonymous Coward | about 11 years ago | (#5631058)

No more sniffing when i'm checking items in the refrigerator - is it 'alive' ? gzip is the answer!

uhhh.. huh? (2, Interesting)

SamBeckett (96685) | about 11 years ago | (#5631059)

Doesn't gzip only look for patterns in one dimension? Assuming they are using these for pictures, they are missing the boat on at least one more area of complexity!

Goatse??? (-1, Offtopic)

Christopher_G_Lewis (260977) | about 11 years ago | (#5631063)

Yet again, I cannot get over my fear that clicking on any link in any comment will be Goatse.

Damn you, /.!

I am, at this time, copyrighting the phrase, Goatsephobia :-)

Be Humble (4, Funny)

hugesmile (587771) | about 11 years ago | (#5631067)

OK, so if I have this right: Life is less random, and more predictible (more compressable)than non-life.

So that tells me that life contains less data then non-life.

Perhaps sophisticated life (human life?) contains even less data than non-sophisticated life. So the smarter we get, the more predictable we get, and the less data we contain.

Perhaps we will someday get smart enough to be totally compressed to one bit. In the time I thought about this concept, I think my gzip file got even more compressed. Hmm....

Re:Be Humble (0)

Anonymous Coward | about 11 years ago | (#5631085)

Yes, in the future, we shall all be hydrogren atoms!

Re:Be Humble (5, Insightful)

javatips (66293) | about 11 years ago | (#5631134)

> So that tells me that life contains less data then non-life.

No, it means that life contain less noise than non-life.

My question (1)

AEC216 (621410) | about 11 years ago | (#5631070)

So does this go to show that life is anti-enthrophy?

Re:My question (-1, Flamebait)

Anonymous Coward | about 11 years ago | (#5631240)

Its E-N-T-R-O-P-Y you stupid fuck.

The same image... (0)

Anonymous Coward | about 11 years ago | (#5631086)

....as image 1 and image 2 seems to have different complexity........???

Re:The same image... (2, Informative)

maxwell demon (590494) | about 11 years ago | (#5631176)

Hmmm.... really as "image1" and "image2", and not as "img1" and "image_2_with_an_incredibly_long_file_name"?

BTW, if you want to be file name independent, you can use
cat file | gzip -c9 | wc -c
This way, gzip doesn't see the file name, and therefore doesn't include it into the .gz file.

Does this work for anyone? (1)

iainl (136759) | about 11 years ago | (#5631088)

I tried it with two jpgs off my little home site, and the poor thing died with a div by 0 error.

Either I'm doing something wrong with my jpg compression, or this is slightly flakey - a successful pair of test pics would be most helpful.

Re:Does this work for anyone? (1)

iainl (136759) | about 11 years ago | (#5631118)

OK, I answered my own question - despite being a web-based app, it wants to be passed files off my local harddrive, not ones off a website. Odd.

Re:Does this work for anyone? (0)

Anonymous Coward | about 11 years ago | (#5631270)

Try not to use naked pictures of yourself next time.

bzip2? (2, Interesting)

maxwell demon (590494) | about 11 years ago | (#5631093)

Has anyone checked if bzip2 is better or worse in detecting biological products?

After all, they have quite different compression characteristics (on one hand, compression of a megabyte of zeroes is much better in bzip2, OTOH adding the same file on top of itself and then compressing gives much less additional compressed size with gzip than with bzip2 - tested with /usr/src/linux/kernel/sys.c, 24957 bytes uncompressed).

I'm jealous... (1)

PKFC (580410) | about 11 years ago | (#5631094)

...that I am not a unix user.

Where is .sit? I thought you people became mac users! :P

Re:I'm jealous... (0)

Anonymous Coward | about 11 years ago | (#5631219)

Oddly enough, Apple are doing some kind of evil compression thing themselves in new machines with OS X.

Someone told me to try this, and I did, and it's TRUE! Try it yourself!
Forwarded message:
1) Double click on the hard disk icon on your desktop
2) Double click on the Applications folder
3) Double click on the utilities folder
4) Double click on the Terminal.app application

Then try typing - there's an actual unix computer in there!

I tried it myself - I not only have access to things like gzip and gnutar, but whole range of unix things.

Is this why Macs are so expensive now? Is it because they have a second computer INSIDE the mac? And how do they fit it all in my iBook?

Plz advise.

The fractal geometry of nature? (4, Interesting)

RNG (35225) | about 11 years ago | (#5631095)

Although I'm certainly no compression expert, I think this makes sense. Many (most?) natural systems have fractal structures on some level so it only makes sense for them to compress better (ie: have more self-similar features) than systems which don't have this feature.

Then again, what do I know? Maybe something more immersed in this field can tell us whether there's a seed of truth to my ramblings ...

--> R

Re:The fractal geometry of nature? (1)

battjt (9342) | about 11 years ago | (#5631221)

Could you give some examples of fractal structures in a human?

GZip doesn't do fractal compression. It will compress repeating patterns though. (My two arms will be compressed because they are similar, not because that look like little humans.)

I don't think there are many fractal structures in nature. Rocks are different than sand. Humans are different than cells. A field is different than grass, which is different than cells, which is different than molecules.


Re:The fractal geometry of nature? (2, Insightful)

jeff_bond (135948) | about 11 years ago | (#5631268)

Could you give some examples of fractal structures in a human?

For starters, how about the branching structure of the airways in your lungs?


Genome project... (0)

Anonymous Coward | about 11 years ago | (#5631099)

It would be nice if someone would play with gzip and the various genomes available.

That would probably have more relevance (being in a single dimension) than images, IMHO.

Thought this would be somewhat obvious... (2, Insightful)

ignoramus (544216) | about 11 years ago | (#5631100)

Every one of us is incredibly redundant, and I don't just mean in our posts on slashdot!

Simply consider that you can have a reasonably good duplicate of yourself, with only the DNA contained in a single cell!

You may need most of your parts to be functional but, information-wise, it all comes down to 1 germ cell (say, a spermatozoid) and the aparatus needed to move it into proximity of another compatible germ cell ;)

Re:Thought this would be somewhat obvious... (1)

peerogue (623472) | about 11 years ago | (#5631143)

But the DNA is not visible in the pictures of you, is it? However, we're mostly symetrical, which is a form of redundancy visible from the outside.

By the way, the spermatozoids are not issued from a mytosis but from a meyosis, and therefore do not contain the full DNA, but only half of it (half the chromosoms, in fact).

Re:Thought this would be somewhat obvious... (1)

ignoramus (544216) | about 11 years ago | (#5631212)

But the DNA is not visible in the pictures of you, is it?

I am certain you will agree that phenotype has something to do with genotype. Let's say, for instance, that your skin cells - although there are very many - are pretty much all the same (since they are coded in a small part of that DNA). This will cause different areas of your skin to be similar (same color, etc.) and easier to compress than some random static.

The fact your body is a redundant interpretation of your DNA does impact the compressibility of your image.

On the other hand, I very much doubt that gzip will be able to note such large scale symmetries/patterns as "he has two arms and two legs"

By the way, the spermatozoids are not issued from a mytosis but from a meyosis, and therefore do not contain the full DNA, but only half of it.

Yeah, I'm actually aware of that. And although the mechanisms related to dominant-recessive genes and sexual characteristics are influenced by these two pairs, the fact is that you get 22 autosomes from your parents (11 from each) and these are basically duplicates (the 2 extra sex chromos are a little different). Doubling those 11 from the sperm won't result in you but will result in a functional human being nonetheless.

All you need is 1 germ cell to produce a perfectly normal (although statistically anomalous) female (XX sex chromos).

Re:Thought this would be somewhat obvious... (2, Interesting)

AugustMoon (593085) | about 11 years ago | (#5631283)

Your DNA is only sufficient to create another state machine with the same rules you had at birth.

It will not re-create your complexity because our dna-state machines are designed to create brains which are 'genetically-memoryless', capable of self modification, and have incredible data collection and storage capacity.

Think of your DNA as the graphics engine for Quake. It is relatively small (space-wise) compared to the textures and levels. Add different data, and you have still have a first-person game, but a completely different one.

Fractal Compression (1)

Mr Pippin (659094) | about 11 years ago | (#5631103)

Hmmm, since life forms are generaly based on geometric patterns, I would think fractal compression would be even conclusive in terms of detecting life.

Not GnuZip any more (0)

Anonymous Coward | about 11 years ago | (#5631107)

Now gzip stands for GenomeZip.

Application (0)

Anonymous Coward | about 11 years ago | (#5631110)

So an RFC1437 bodypart could fit on a floppy?

Some companies are using model based mathmatical t (1, Interesting)

Anonymous Coward | about 11 years ago | (#5631113)

Companies like Image Metrics use a mathematical translation into n-dimensional space similar to a compression algorithm to perform some interesting kinds of image recognition and processing. Examples are medical diagnosis, facial recognition, crystal growth monitoring and the like.

http://www.image-metrics.com/pages/technology.as p

this might have a few glitches (4, Funny)

jj_johny (626460) | about 11 years ago | (#5631114)

When I compressed the transcript of the Osbornes, it got increadibily high compression but I don't think they are intelligent life forms. Or maybe I am really wrong.

This post can't be compressed.

Re:this might have a few glitches (1)

nounderscores (246517) | about 11 years ago | (#5631302)

When I (compressed|1) the transcript of the Osbornes, it got increadibily high #1 but I don't think they are intelligent life forms. Or maybe I am really wrong.

This post can't be #1.

The Mars fossil IS made by life; my wife is not. (5, Funny)

Saint Aardvark (159009) | about 11 years ago | (#5631120)

In a true first for extraterrestrial biotic research, I decided to compare two pictures:

at the comparison page [astrobio.net] attached to the article that lets you run the same test on images that the researchers tried. In a startling discovery that is sure to earn me a Nobel Prize for Physics, Chemistry, Biology and Marital Relations, I was told the following:

"Answer: Image 1 [the Mars image](1.43702451394759 % compression) has a higher complexity measure than image 2[the image of my wife] (0.773501341151519 % compression), and thus image 1 is more probably biogenic."

Not only does this prove that there was once life on Mars, but it also proves that my wife is some sort of robot. Further research will be undertaken pending receipt of my prize money.

Re:The Mars fossil IS made by life; my wife is not (0)

Anonymous Coward | about 11 years ago | (#5631145)

Hubba Hubba!

Re:The Mars fossil IS made by life; my wife is not (5, Funny)

Anonymous Coward | about 11 years ago | (#5631160)

The problem here is that your wife is wearing clothes. Clothes are man made.

If you send me a picture of your unclothed wife, I'll be happy to, uhm, test this theory.

Re:The Mars fossil IS made by life; my wife is not (3, Interesting)

(startx) (37027) | about 11 years ago | (#5631183)

ahh, but the picture of your wife contains a lot of inanimate objects. I'm sure if you cropped the picture down to just her (or reasonably close) she would fare better in this comparison.

Re:The Mars fossil IS made by life; my wife is not (1)

paulhar (652995) | about 11 years ago | (#5631226)

Erm, it's a photograph. It's all inanimate. Just like my ex-wife, though she was inanimate in real life too...

Re:The Mars fossil IS made by life; my wife is not (2, Funny)

gatesh8r (182908) | about 11 years ago | (#5631256)

Wait wait wait you have a wife? Dude, this is Slashdot; are you sure you're not a misdirected user???

Re:The Mars fossil IS made by life; my wife is not (0)

Anonymous Coward | about 11 years ago | (#5631315)

I think I speak for all of /. when I say
  • Your wife is hot.
  • Can I have her number?

The.. (2, Funny)

saqmaster (522261) | about 11 years ago | (#5631126)

.. thought of being gzipped is quite disturbing.

Mad Scientist: "Fire up the GZip Continueum Transfunctioner!"
Operator: "Okay, Boss"


am i reading this right? (0)

Anonymous Coward | about 11 years ago | (#5631128)

i'm dropping pictures in from stileprojects webcam site versus pictures of various cars. i consistently get results showing that the anime is more biogenic than a car.

new pr0n detectors (1)

neverpsyked (578012) | about 11 years ago | (#5631129)

Great, I think I just figured out a new method for pr0n detection. Unless we're talking anime, of course.

In other news... (1)

MoeMoe (659154) | about 11 years ago | (#5631139)

The creators of WinZip filed suit stating that they have a better assembled compression utility and will use it not only to distinguish between living and non-living, but make the living incased in a tiny plastic cell on a keychain that kids can take with them, feed, and keep healthy.

Uhhhhh... No.....Does It Detect Red Herring Too? (-1, Troll)

Bowie J. Poag (16898) | about 11 years ago | (#5631153)

Like any compression algorithm, gzip seeks to summarize reigons of similar, predictable data while preserving aspects of dissimilar, unpredictable detail.. That applies for everything, from text to white noise. Saying that gzip is especially good at detecting fossils is like saying fishing poles are really good at telling the difference between a biscuit and a tractor.

You can safely determine a number of different things based on the compression yeild of file. I'll give you a practical, wonderfully self-serving example. Propaganda's random-transitive tile generator [ibiblio.org]. Given a pile of several thousand source images, the engine will combine several of them according to random gamma values. When the final image data is made, the generator cranks out a GIF equivalent of the tile, and a bytecount performed. If the bytecount of the GIF file is unusually low, that means the tile came out of flat and monotone in appearance.. and is subsequently scrapped.

Net result: I can weed out featureless & shitty images without physically inspecting them by hand. Since I know ahead of time that only the good tiles come out of the process as 318KB-or-larger GIF equivalents, I can safely assume that anything that comes out to be lower than 318KB isn't worth keeping.

Net result: A constantly purified source of additional seamless images [ibiblio.org] for subsequent passes of the engine.

The Live test that doesn't work... (0)

Omicron32 (646469) | about 11 years ago | (#5631155)

Using that live test, I gave it one image of my face, and another of some rocks.

Apparently, the rocks are "more probably biogenic" than I am. Bastards.

Information vs. Meaning (2, Interesting)

16977 (525687) | about 11 years ago | (#5631157)

One of the posters brings up an interesting point. Although meaningful data has more information than pure noise, it also has less than a blank signal. When you download pictures, regardless of the "meaning" they have to you, their compression can vary a considerable amount. And you've probably heard the statistic that the english language is 50 percent redundant. That figure may vary a bit too, but the point is that english's meaning to us is independent of its information content. And the probability that an image of a life form with more information will also have more "meaning" is probably just as uncertain.

Kolmogorov Complexity (4, Interesting)

MarkWatson (189759) | about 11 years ago | (#5631158)

This seems like a "sort of" restatement of Kolmogorov Complexity.

Roughly, Kolmogorov Complexity is a measure of randomness - the measure is how long a computer program needs to be to reproduce data (pardon an oversimplification).


suck my nutsack you wankers! (-1, Troll)

Anonymous Coward | about 11 years ago | (#5631166)

I'll bet $5 (I hacked into VA Linux's bank and stole all their funds!) that requiring logging in won't cut back on the crap posted!

Filtering Images (2)

CommieBozo (617132) | about 11 years ago | (#5631170)

While slightly different, this reminds me of the way I filtered a bunch of images from a video camera. I was taking many frames per second of a thunderstorm and I wanted to find which frames out of thousands contained lightning strikes.

It was pretty simple... Images over a certain size contained lightning, the others were mostly black, therefore smaller. Once I filtered it that way, manually filtering out the better images was easy.

Doesn't seem to work... (1)

Kadagan AU (638260) | about 11 years ago | (#5631171)

Ok, I might very well be confused, and not using the tool right, but I plopped two of the only pictures I could find into this thing; an areal shot of a bunch of houses, and a picture of a really good looking woman. The thing told me that the houses were much more likely biological. Strange. Unless it can see implants?


Operating Principle? Kolmogorov Complexity (3, Informative)

fygment (444210) | about 11 years ago | (#5631179)

Read about it in _the_ book (http://www.cwi.nl/~paulv/kolmogorov.html) or check out the web site here (http://www.hutter1.de/kolmo.htm). For a more succint idea of the approach, these articles by one of the gurus on the topic (http://www.cs.ucsb.edu/~mli/focs.ps and http://www.cwi.nl/~paulv/papers/ecml97.ps).

Can we use this to defeat spam? (1)

tetrode (32267) | about 11 years ago | (#5631181)

Couldn't we use a similar system to see if an e-mail is spam or not?


Re:Can we use this to defeat spam? (1)

spot35 (644375) | about 11 years ago | (#5631223)

Did you not read the summary of the report -
  • This extends the simple, but powerful, uses of gzip to biogenic fossil detectors, in addition to spam cop filters...

Biological clocks in unicorns... (4, Interesting)

dpbsmith (263124) | about 11 years ago | (#5631186)

zip is a fine thing, but it's not a pattern-recognition program!

This is the loopiest thing I've heard of since Rosenblatt reported that his Perceptrons could distinguish between music composed by Bach and music composed in imitation of Bach.

Good heavens, any picture that's slightly out of focus will now be declared to be evidence of "biological processes."

I'm guessing that the researchers are not as nutty as they sound and that they've done more than is being reported, but still...

Reminds me of the researchers in the sixties who were publishing analyses of data that supposedly showed "biological clocks." It turned out that they were using smoothing algorithms that, basically, were filters that had a 24-hour peak in the frequency domain--so their analysis was creating the patterns they claimed to be detecting. A debunking article was published in Science in which another research used data from a random number table (the "unicorn" data) and showed that the same analysis techniques showed that the unicorn had a biological clock.

Re:Biological clocks in unicorns... (2, Insightful)

archeopterix (594938) | about 11 years ago | (#5631247)

Similar thoughts here. From the article:
So how does one separate the wheat from the chaff, the true stromatolites from the fakes?
One method is to examine the suspect rock with a microscope, looking for visual evidence of microorganisms. But as researchers who study ancient terrestrial rocks- and one notorious Martian meteorite - have discovered, it isn't all that easy to tell, just by looking at shapes, whether or not a microscopic blob in a rock was once alive.
So, what do they verify the gzip method against? Their guesses about the image origins? Does not look great from the methodology standpoint, eh?

lameness filter? (1)

Speare (84249) | about 11 years ago | (#5631192)

Isn't that conclusion the opposite of CmdrTaco's use of compression to weed out "lame" postings? More noise is apparently more valuable discussion, while less noise is somehow considered likely spam? How many good postings have you seen with a line "this has been added to get past the lameness filter"?

GZip solves the Fly! (1)

paulhar (652995) | about 11 years ago | (#5631202)

This must have been the solution in second Fly film. The researchers kept using lame when they should have used gzip!

and language detection. (0)

Anonymous Coward | about 11 years ago | (#5631224)

gzip seems to be good for every sort of pattern detection. There was an article, but I forget where I read it, a couple of months ago, on how gzip was used to detect the language used in a few written words. I know it's OT, but could somebody who remembers please answer me with a link ?

gzip - the swiss army knife utility (5, Funny)

kinnell (607819) | about 11 years ago | (#5631227)

I myself have successfully used gzip for factoring large prime numbers, sorting the men from the boys, unblocking the kitchen sink and cracking safes. I'm currently trying to locate Osama Bin Laden by compressing Al Jazeera footage, but all I come up with are reports of Elvis sightings.

Slightly Dodgy (5, Interesting)

jolyonr (560227) | about 11 years ago | (#5631238)

This whole thing is slightly dodgy, and I begin to wonder whether it was released a day early by mistake.

The big problem is the use of JPEG source images. Unless you've stuck it up to the maximum size on quality, then the jpeg artifacting (which is in effect repeating blocks of image data after transitions) will probably mask any hidden level of complexity in the images - the human brain is a much better tool at pattern recognition than most computer algorithms (especially those algorithms not designed for the task!).

Throw high-resolution bitmap files at it, and I'd be more persuaded that there is a genuine effect. Until then, I suspect it's more of a happy coincidence that the files they've thrown at it give results they are excited about.


Was this the April issue? (0)

Anonymous Coward | about 11 years ago | (#5631254)

April Fools isn't until tomorrow.
Did someone perhaps get caught a day early?

I know what you all need to do (0)

Anonymous Coward | about 11 years ago | (#5631260)

BZZZIIIPPP2 it, haha.

This isn't funny.

Simphile Seems to do something similar (1)

spot35 (644375) | about 11 years ago | (#5631261)

But not quite. It detects patterns but it does use gzip in a similar manner.

Simphile [allosx.com] uses the gzip program to detect patterns in two files. Used to determine things from whether two sonnets where written by shakespeare or whether certain sounds files came from the same source.

gzip == measure of information content (2, Informative)

firecode (119868) | about 11 years ago | (#5631271)

<p>This is not surprising at all really. Gzip and other compression utilities can be used to get upper bound for real/nonredundant information content.</p>

<p>I'm not sure if above is public knowledge, but I have used it as a one additional feature for certain pattern recognition tasks for a while.</p>

Compression to measure semantic content (3, Interesting)

KingRamsis (595828) | about 11 years ago | (#5631279)

It was an interesting coffee break discussion with one of my professors, we were arguing if there is neat way to estimate the semantic content of a neural network after training it, I recall suggesting to compress the value of the weights of all layers and the less compressible the more this neural network is trained.

April Fools.... (0)

Anonymous Coward | about 11 years ago | (#5631289)

...is not until tomorrow.

mmm... (0)

Anonymous Coward | about 11 years ago | (#5631324)

now imagine a bewoulf cluster of these....

All else being equal.... literally. (1)

Captain_Stupendous (473242) | about 11 years ago | (#5631326)

In order for this to work, the tester would need to eliminate a whole spectrum of other variables that would affect the outcome of the test. Image format (JPEG compresses less than BMP), image size, JPEG "resolution" (pixels per inch), color depth, etc. Of course, I assume they have some way of standardizing their input images, but it's unlikely to become an automated process....

Pattern Recognition (2, Interesting)

cyber_rigger (527103) | about 11 years ago | (#5631330)

I envision a whole array of compression algorithms.

Each algorithm could be fine tuned for a paticular type of pattern.

Is that an elephant or a giraffe?
Does it compress better with the elephant algorithm or the giraffe algorithm?
