Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

MD5 To Be Considered Harmful Someday

michael posted more than 9 years ago | from the house-built-on-sand dept.

Encryption 401

Effugas writes "I've completed an applied security analysis (pdf) of MD5 given Xiaoyun Wang et al's collision attack (covered here and here). From an applied perspective, the attack itself is pretty limited -- essentially, we can create 'doppelganger' blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum. But MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. Wang released the two files needed (but not the collision finder itself). A tool, Stripwire, demonstrates the use of colliding datasets to create two executable packages with wildly different behavior but the same MD5 hash. The faults discovered are problematic but not yet fatal; developers (particularly of P2P software) who claim they'd like advance notice that their systems will fail should take note."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered


damn (1, Interesting)

YetAnotherDave (159442) | more than 9 years ago | (#11023624)

I guess I know what I'll be coding now - SHA-1 just got a lot more important in my priority list... :(

Re:damn (4, Insightful)

networkBoy (774728) | more than 9 years ago | (#11023668)

Another option is to hash against two very different algorithms, that even if both are partially insecure, the chances of being able to trick both are exponentially higher.

Re:damn (1)

frission (676318) | more than 9 years ago | (#11023870)

this will make file checking just take that much longer. have you done md5 sums on iso files...for say, a linux distro? it takes awhile, even on a p4 3ghz with 1 gb of ram. but now you want to do it twice (2 diff algos)?

Re:damn (1)

MatanZ (4571) | more than 9 years ago | (#11023970)

The time is usually wasted in reading the file from disk, so calculating two hashes concurrently will take about the same time.

Re:damn (2, Insightful)

TCM (130219) | more than 9 years ago | (#11024116)

A P4 at 2.6GHz does >300MB/s md5 according to openssl speed md5. As noted, you probably wait for the data to be fetched from disk rather than the checksum to be computed.

Re:damn (1)

bigbadbob0 (726480) | more than 9 years ago | (#11023873)

>>Another option is to hash against two very different algorithms, that even if both are partially insecure, the chances of being able to trick both are exponentially hig Is it? Can you prove it? You shouldn't be able to because it simply isn't true. Collisions will exist.

Re:damn (2, Informative)

networkBoy (774728) | more than 9 years ago | (#11023982)

"You shouldn't be able to because it simply isn't true."

I have to disagree with you here.
If I have algo A and algo B:
I hash with algo A and get a value which I store.
I hash with algo B and get a value which I store.
While the security does not add up to A^B it does ammount to > A+B, which is still better than A or B only. (I really wish I had my Crypto reference books handy)

Other posters mentioned it was more work and equiv to one secure algo, both those statements are true; as I pointed out this was an alternative to writing a new SHA-1 algo.

Re:damn (1)

Meostro (788797) | more than 9 years ago | (#11024181)

It depends on the two hash algos and how much/little they inter-relate.

For algorithms that are provably unrelated, it should approach A*B.

However, if they're closely tied to each other (not sure, but probably MD5 and MD4?), it could be as little as A+B.

Re:damn (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#11024058)

Wow, a contrary opinion on slashdot with no supporting evidence. Never seen that before..

Re:damn (2, Insightful)

The Milkman (214539) | more than 9 years ago | (#11023888)

Which is the same as just one secure algorithm.

Very true (1)

jd (1658) | more than 9 years ago | (#11024112)

The only exception being where the two algorithms have a coincidentally overlapping weakness. At which point, you're no better off.

However, there is no real need to use weak algorithms (unless you're in the Government and need to use the FIPS standard). Unless you're doing something time-critical, just use SHA-1 and Whirlpool.

Re:damn (2, Informative)

LiquidCoooled (634315) | more than 9 years ago | (#11023710)

There will ALWAYS be collisions with any kind of hashing algorythm.

Thats the nature of the game we play.

The only hashing system without collisions is sending the original file itself.

Re:damn (5, Interesting)

WolfWithoutAClause (162946) | more than 9 years ago | (#11023849)

There will ALWAYS be collisions with any kind of hashing algorythm.

Yes, but a good hash makes it *extremely* difficult to find them. MD5 is looking pretty mediocre right now.

Re:damn (1, Informative)

networkBoy (774728) | more than 9 years ago | (#11023886)

"The only hashing system without collisions is sending the original file itself."

That's not hashing:
Producing hash values for accessing data or for security. A hash value (or simply hash), also called a message digest, is a number generated from a string of text. The hash is substantially smaller than the text itself, and is generated by a formula in such a way that it is extremely unlikely that some other text will produce the same hash value.

RIAA (0)

Anonymous Coward | more than 9 years ago | (#11023731)

So the RIAA can corrupt Kazaa and Gnutella music downloads by sharing garbage files with matching checksums.

Re:damn (1)

user9918277462 (834092) | more than 9 years ago | (#11023829)

Gentoo has already considered [gentoo.org] this problem as it affects the Portage system (two months ago, no less). Portage distfiles are only validated by md5 sums (no gpg signing yet, though it's being worked on) so this could be a serious problem at some point.

break it down (0, Interesting)

Anonymous Coward | more than 9 years ago | (#11023914)

Slicing and dicing the file into many different overlaping blocks and then computing a md5 sum for each block would be much much stronger.

a. MD5 sum entire file
b. MD5 sum each 10K block in the file (overlap blocks by ZZZ bytes)
c. break file into different blocks such as every 40th byte in the file ....

Two files with the same md5 hash? (5, Funny)

Anonymous Coward | more than 9 years ago | (#11023635)

I can only hope I live that long.

fdsafdsaf (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#11023643)


Exploit? (4, Interesting)

Limburgher (523006) | more than 9 years ago | (#11023669)

So does this mean that it's possible to find a useful MD5-equivalent file for any file? Just because someone alters a file does not mean they have done anything destructive. Would one be able to take a binary, make a change of some sort, and then run a tool to determine the block of data to add to the binary to both allow the change to take effect and cancel out the MD5 change? How complex would it be to construct this tool?

Re:Exploit? (0)

Anonymous Coward | more than 9 years ago | (#11023717)

Read the fscking blurb, it's not even possible to find a non-useful MD5-equivalent file for any file. You have to have control over both files for a collision to happen. ...for the moment.

Re:Exploit? (3, Insightful)

Ayaress (662020) | more than 9 years ago | (#11023775)

It doesn't have to be harmful to break a ptp system. There's a pretty common exploit on Kazaa where people have a file just containing random junk that registers as a match to a popular file. If you download taht file, and get any portion of it from the fake, the file is corrupted and useless. Somebody could use these fake files to "poison" popular torrents, making it very unlikely that anybody on them will get uncorrupted files.

clarification: (1)

Ayaress (662020) | more than 9 years ago | (#11023856)

I say this under the assumtion that "someday" somebody will figure out a way to take a good file and make an MD5 equivalent file to poison a torrent with. The way I read the blurb, it sounds like both files were specifically crafted to create a collision.

Re:Exploit? (1)

Pxtl (151020) | more than 9 years ago | (#11023797)

The blurb is unclear, but it sounds like you have to have created the original file with the original intent that a portion would be "swapped". So, a content creator could create a legitimate piece of content and get that reviewed, and then create a malicious piece of content with the same md5. Not quite as nasty as hacking an existing md5'd file, but still nasty.

Of course, I could be completely misunderstanding the information at hand.

Re:Exploit? (2, Insightful)

Pxtl (151020) | more than 9 years ago | (#11023866)

Wait - no - I'm completely fscking wrong.

I think the point was this.

file1: xxxxxxxxxx
file2: xxxxxyyyyy

%file1 = %file2

now, user downloads the file from p2p, and one user gives him file1 and one user gives him file2.

you get:
file3: xxxxxyyxxx
which has the same has file1 and file2. That's the problem - not only can two files have the same hash, the combination of those files (as would occur with a corrupted P2P file) also has the same hash.

Or I still could be wrong.

MC5 (0)

Anonymous Coward | more than 9 years ago | (#11023671)

mc5 was harmful to your ears...

In english (4, Funny)

ValuJet (587148) | more than 9 years ago | (#11023712)

Is there a translator from ultra-nerd to english?

Re:In english (1)

LiquidCoooled (634315) | more than 9 years ago | (#11023759)

Yes, when you run the md5 hashing algorythm over certain files, its possible that you can change the file and still get the same hash.

Its a small blindspot that isn't catered for in the mixing process.

Re:In english (1, Funny)

Anonymous Coward | more than 9 years ago | (#11023760)

Is there a translator from ultra-nerd to english?

A computer thingy has an owie.

Re:In english (0)

Anonymous Coward | more than 9 years ago | (#11023810)

I'm sorry - I don't understand the question.

Re:In english (0)

Anonymous Coward | more than 9 years ago | (#11023812)

Seriously, I tried it against babelfish and couldn't come up with squat.

Re:In english (0)

Anonymous Coward | more than 9 years ago | (#11023868)

You're sure you're visiting the correct site ? :)

Only 340282366920938463463374607431768211456 ? (1)

wsanders (114993) | more than 9 years ago | (#11023936)

Well, an md5sum has only 16^32 possible values, possibly less (I don't know the algorithm exactly), so it was only a matter of time.

Re:In english (4, Informative)

Anonymous Brave Guy (457657) | more than 9 years ago | (#11023990)

Short version: A common technology for verifying that a file you've downloaded is legitimate and untampered-with, known as MD5, isn't as secure as people thought.

Slightly longer version: MD5 is a way of generating a checksum -- a single, comparable value -- from a file. Ideally it is supposed to give you different numbers for different files, so if a web site advertises the checksum a file should have, you can compare that with one generated from the file you actually got to see whether the file you've downloaded has been modified, potentially maliciously.

The research shows that it is possible for someone to construct a drop-in replacement for the file you thought you had that generates the same MD5 checksum as the original, so anyone attempting to validate the file this way would think they had the real thing. If it turns out that you can construct a damaging replacement for a common file -- perhaps an installer for a popular application like Firefox or OpenOffice that's usually downloaded from a public server -- then this could open a loophole for viruses, worms, etc. that would slip through the security net often used by cautious people when downloading such programs.

Re:In english (1)

supersmike (563905) | more than 9 years ago | (#11024140)

...so if a web site advertises the checksum a file should have, you can compare that with one generated from the file you actually got to see....

If someone can push a hacked file up to some site for download, couldn't they just hack the advertised MD5 too? What sort of precautions are in-place to defeat this?

What does this mean (1)

Jonny_eh (765306) | more than 9 years ago | (#11023718)

Does this mean that I can take a given binary file, like kde.rpm, and change it so that its' contents are different, and harmful, yet the MD5 hash is the same? This has never been done before? Sounds like an interesting 4th-year engineering project to me. I'd think it'd be really easy to mess up a file and keep the same MD5 hash (like screwing up an avi file shared ona p2p network). But to actually change the file so that it behaves a certain way, wow.

Re:What does this mean (1)

PatrickThomson (712694) | more than 9 years ago | (#11023796)

It means that you can create two files with the same hash but different contents. An example would be, you release your own (specially tailored) rpm for a completely harmless thing, then a few weeks later silently swap it for the malicious one with the same md5sum.

Re:What does this mean (1)

Kunnis (756642) | more than 9 years ago | (#11023944)

The first thing I thought about is something like Redhat is distributed via Bittorrent. What happens if someone replaces - say gcc and login to do the old "my special login has a backdoor, and my gcc detects when I'm compiling login, and puts the exploit it, and my gcc detects when gcc is compiling, and puts the exploit into gcc" trick, and then hops onto the torrent network, (perhaps with some special mods so they will distribute the "hacked" sections more often) and then start distributing the exploit. Thoughts? You would even know the IP address of the guy that downloaded the chunk with the exploit, and then he would start distributing the hacked version. Now I'll admit it's a bit far fetched, but I also never underestimate the power of REALLY determined people. I also realize it would be tricky keeping it small enough, and getting it to recompress and... But still, someone might figure it out. And with how big the Linux community is, if it's possible, someone will do it. Kunnis

Good analysis (5, Funny)

overbyj (696078) | more than 9 years ago | (#11023722)

By examining the MD5 hash using a sophisticated Fourier schema followed by deconvolution with a bit binary-inequal collision analysis, it is quite obvious I have no freaking clue what this stuff is about.

I am glad somebody does.

Re:Good analysis (1)

Quixote (154172) | more than 9 years ago | (#11023930)

Here's a summary for ya:
Fire burns. Ice remains frozen. Fire and Ice have the same MD5 hash.

(From page 3 of the paper)

Re:Good analysis (1)

ArsonSmith (13997) | more than 9 years ago | (#11024022)

hmm, could you dumb it down a bit?

Let's face it (2, Insightful)

mordors9 (665662) | more than 9 years ago | (#11023724)

Someone with the knowledge and will and time can come up with a way to circumvent almost any protective scheme they come up with. Likely the only way to really safeguard something like this is to do a very time consuming and cpu intensive comparison of the two files. It will come back to "do you trust the source of the file".

Already Happening (0)

mxmasster (118546) | more than 9 years ago | (#11023772)

I've seen this exact thing in the wild already. One of my customer is storing large amounts of mpeg movies. They were using md5sum to verify file contents for network transfers. This worked great until there was a power failure on of the remote systems. Everything passwd the md5sum perfect, except halfway through the clip you would lose video. I haven't relied on md5sum for over a year now as a result.

Re:Already Happening (0)

Anonymous Coward | more than 9 years ago | (#11023912)


Re:Already Happening (3, Informative)

menscher (597856) | more than 9 years ago | (#11024001)


I'll give you $50 if you can back that claim up. I want to see two video files. They must start out the same, but have a difference about half-way through. And they have to have the same md5 sum. Just post where I can download the two files, and your paypal address.

The way I see it, you've got a 1/2^64 chance of being right. So I'm risking $50/184467440737095516, which isn't a whole lot.

md5sum ne verify (0)

Anonymous Coward | more than 9 years ago | (#11024149)

verify eq cmp or verify eq diff --speed-large-files -q

This is almost appropriate... (3, Funny)

freeze128 (544774) | more than 9 years ago | (#11023773)

If your cursor finds a menu item followed by a dash,
And your double-clicking icon puts your window in the trash,
And your data is corrupted 'cause the index doesn't hash,
Then your situation's hopeless and your system's gonna crash!

Re:This is almost appropriate... (0)

Anonymous Coward | more than 9 years ago | (#11023842)

And your screen is all distorted by the side effects of gauss,
So your icons in the window are as wavy as a souse,
Then you may as well reboot and go out with a bang,
'Cause sure as I'm a poet, the sucker's gonna hang!

Correct me if I'm wrong, but... (5, Insightful)

Sheetrock (152993) | more than 9 years ago | (#11023799)

If I'm translating this properly, a malicious person can do two things with this knowledge:

He can create a file that MD5sum's to the same result as a legitimate file, but does not have full control over the content or size of the result (making this a mostly useless avenue of exploitation except for people who want to spread trash on P2P networks -- I.E. it shouldn't particularly bother anyone except people who already don't care about security).

Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.

Consequently, while this is of academic interest I don't see what the big deal is; any time you reduce a large file to a fingerprint you will inevitably run into problems like this because it is impossible to represent one-to-one every individual possible combination of a large set of data in smaller sets ("fingerprints"). You can reduce the risk by increasing the set domain with a larger variadic function but it is impossible to escape this constraint without using fingerprints as large as the data itself.

Re:Correct me if I'm wrong, but... (1)

jd (1658) | more than 9 years ago | (#11024035)

Many passwords are MD5 hashes. You may be able to use this to attack password files, by finding some string that hashes to the same password entry, in a reasonable length of time.

It's not much of a threat to signature schemes, but there are enough places where MD5 is used as a one-way encryption of an access token that this attack poses a potential security hazard.

OT yet informative: Spock and Try (1, Informative)

Anonymous Coward | more than 9 years ago | (#11024066)

"Try not. Do or do not, there is no try.
-- Dr. Spock, stardate 2822.3."

I think you mean Yoda.
Besides, Dr. Spock is the baby doctor.

Re:Correct me if I'm wrong, but... (1)

CatsupBoy (825578) | more than 9 years ago | (#11024100)

Lets say I create two contracts with the same md5 sum, we agree on one of the contracts, and I send you the md5 sum for future proof of its authenticity. Afterwards, I maintain the contract and allow you to check it at any time with your md5 sum for verification.

Later down the line, lets say I replace the original contract with the second one. You can still validate its authenticity via the same means, however in it would be new terms we did not agree to originally.

Unless you can produce a copy of the doppleganger, your out of luck in court because you cant prove that i've created an alternative.

Now, this sounds mathematically improbable that I could come up with two contracts that make sense and have the same md5sum, but none the less possible. And when dealing with exact mathematics, we prefer exact results.

Re:Correct me if I'm wrong, but... (1)

nautical9 (469723) | more than 9 years ago | (#11024107)

Couldn't a (newer) P2P app just use two different algorithms to do the checksum? Assuming both use very different schemas, even if one is compromised (like MD5), the other would not be? Not sure how computationally expensive the checksums are, but if this kind of sabotage becomes commonplace, it would make for an easy fix.

Re:Correct me if I'm wrong, but... (4, Insightful)

chialea (8009) | more than 9 years ago | (#11024117)

When you're dealing with cryptography, it should be very, very, very hard to find collisions. If you find enough of them, you can proabably find something bad with the same hash value. For example, if you sign a digital document that says you're going to pay me $1 for my pencil, and I find a suitable hash collision, I could make it look like you signed a promise to pay me $3,000 for some used tissue. I wouldn't rule out that someone could find a harmful collision for a program distributed online, and substitute a trojan. If the prize gives enough reward, people will throw a lot of computational power at it, and will likely hit pay dirt.

Secondly, this is quite a signifigant break. Once a hash function has had an attack like this discovered, it often becomes completely useless not long down the road. I work in cryptography, and the people I know have written off MD5. Heck, the people I know are also quite worried about SHA-1, and the current best attack against that one isn't nearly as strong.

The upshot of this is that this hash function should NOT be considered secure any more. For now, if you are not protecting anything of high value, you're probably fine. Tomorrow? Possbily. But soon, you're not going to be protected at all, and so you should start worrying about that now, instead of when you're already in trouble.


Dirt (2, Insightful)

mfh (56) | more than 9 years ago | (#11024175)

I've always called this dirt. Some folks call it padding, but it's essentially the same. However up until now, no one has known how to do it with md5 and come up with the same sum. This is nothing more than a hack to me.

The bottom line is that if you care enough about security this will only be a problem for a day or so when md5 is finally solved completely and the hashes don't take five days to brute force -- they could be solved in a matter of microseconds. As soon as that happens, the next hash system goes up and it would not likely be so easy to break, unless the whole hashing process is somehow cracked wide open (which would be very bad for a number of reasons).

What we'll likely see is that with the greater increases in CPU speeds available to end users, the hash breaking systems can be tested easier until a final reverse algorhythm is available for md5. There already is a site that will break your hash and give you *something* with the same hash, and it takes a couple days.

I have a novel solution (1)

quamaretto (666270) | more than 9 years ago | (#11023800)

Buy it instead.

I'm kidding. I download Linux distro's via bittorrent all the time and willfully ignore the MD5 sum, and now I come to find that it's compromisable. OH SNAP!

I can envision something being added on to bittorrent to prevent bad blocks being passed around via ... magic. Okay. I can't really envision it, but I'm sure it can be done.

On to the next topic I'm not qualified to comment on...

Not just MD5 (2, Interesting)

PureFiction (10256) | more than 9 years ago | (#11023806)

Other weaknesses were reported in various other secure digests, including MD4, RIPEMD, HAVAL-128, SHA-0.

SHA-256 is a good replacement. SHA-1 should be fine but if you are going through the trouble of an upgrade, why not make it sufficiently future proof?

If I Had A Million Terabytes... (5, Funny)

Tackhead (54550) | more than 9 years ago | (#11023826)

If I had a million terabytes of storage, y'know what I'd do?

Two files with the same MD5 hash at once. Aaw yeah.

Re:If I Had A Million Terabytes... (0)

Anonymous Coward | more than 9 years ago | (#11024156)

y'know what I'd do?

Not a girl that's for sure =P

FYI (1)

vigyanik (781631) | more than 9 years ago | (#11023843)

The MD5 algorithm is like a hash function and the MD5 sum is the hash key of the data in the file. If the hash key is smaller in size than the data item then collision (i.e. two data items having the same key) is always possible.

The exploit has used the properties of the MD5 algorithm to create two executable files with the same MD5 sum.

MD5 is obviously less secure (0)

Anonymous Coward | more than 9 years ago | (#11023851)

Anytime you use a hash especially for passwords you are only in fact giving the intruder additional password combinations that will work.


md5(x1) = some md5 string
md5(x2) = same md5 string

where x1 thru x100000... could be any number of possible passwords that will equal the same hash.

You have a point, except... (0)

Anonymous Coward | more than 9 years ago | (#11024027)

With passwords, there is usually a length constraint and a character constraint (Ctrl-C, Enter, Backspace). If it came down to the choice of having only one correct possibility for a password and keeping the passwords in plaintext or having in some cases two and keeping the hashes in MD5 (mind you, on a system that limits users to five guesses per fifteen minutes and a ten-guess lockout), which would you prefer?

Not that it comes down to that, of course, because we've got SHA-1 and other longer hashing algorithms to choose from, but in the scenario above I believe there is less risk with MD5.

The "Detailed Summary" (5, Informative)

Effugas (2378) | more than 9 years ago | (#11023859)

[This is the author]

I've been doing some analysis on MD5 collision announced by Wang et al. Short version: Yes, Virginia, there is no such thing as a safe hash collision -- at least in a function that's specified to be cryptographically secure. The full details may be acquired at the following link:


A tool, Stripwire, has been assembled to demonstrate some of the attacks described in the paper. It may be acquired at the following address:


Incidentally, the expectations management is by no means accidental -- the paper's titled "MD5 To Be Considered Harmful Someday" for a reason. Some people have said there's no applied implications to Joux and Wang's research. They're wrong; arbitrary payloads can be successfully integrated into a hash collision. But the attacks are not wildly practical, and in most cases exposure remains thankfully limited, for now. But the risks are real enough that responsible engineers should take note: This is not merely an academic threat, systems designed with MD5 now need to take far more care than they would if they were employing an unbroken hashing algorithm, and the problems are only going to get worse.

Some highlights from the paper:

* The attack itself is pretty limited -- essentially, we can create "doppelganger" blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum.

* MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. This leads to...

* Attacks are possible using only the proof of concept test vectors released by Wang -- the actual attack is not necessary.

* Stripwire emits two binary packages. They both contain an arbitrary payload, but the payload is encrypted with AES. Only one of the packages ("Fire") is decryptable and thus dangerous; the other ("Ice") shields its data behind AES. Both files share the same MD5 hash.

* Digital Signature systems are vulnerable, as they almost always sign a hashed representation of data rather than the data itself.

* This is an excellent vector for malicious developers to get unsafe code past a group of auditors, perhaps to acquire a required third party signature. Alternatively, build tools themselves could be compromised to embed safe versions of dangerous payloads in each build. At some later point, the embedded payload could be safely "activated", without the MD5 changing. This has implications for Tripwire, DRM, and several package management architectures.

* HMAC's invulnerability has been slightly overstated. It's definitely possible, given the key, to create two datasets with the same HMAC. Attacker possession of the key violates MAC presumptions, so the impact of this is particularly questionable.

* Very interesting possibilities open up once the full attack is made available -- among other things, we can create self-decrypting executables (fire.exe and ice.exe) that exhibit differential behavior based on their internal colliding payloads. They'll still have the same MD5 hash.

* Several doppelgangers may (relatively quickly, as per Joux) be computed within a single multicollision-friendly block. As such, the particular selection of doppelganger sets within a file can itself be made to represent data. It's relatively straightforward to embed a 128 bit signature inside an arbitrary file, in such a way that no matter the value of the signature, a constant MD5 hash is maintained. This is curiously steganographic.

* Many popular P2P networks (and innumerable distributed content databases) use MD5 hashes as both a reliable search handle and a mechanism to ensure file integrity. This makes them blind to any signature embedded within MD5 collisions. We can use this blindness to track MP3 audio data as it propagates from a custom P2P node. "Strikeback" capacity against executable trafficking is even more pronounced -- it's possible to create application installers that self-modify with host identifying characteristics but still successfully retransmit on P2P networks under the global search hash.

I hope this paper proves useful to the security community at large, and I welcome feedback.

--Dan Kaminsky

Re:The "Detailed Summary" (2, Insightful)

BranMan (29917) | more than 9 years ago | (#11024138)

It seems to me that at least the malicious nature of this vulnerability can be limited by using the size of the file as an additional check besides the MD5 hash. In other words, if you know how large the file is supposed to be in bytes, then there is less likelihood that something malicious can be passed off as the original - even if it has the same MD5.
Is that right? I'm no expert by any means, but could this reduce the potential for a real attack to pretty much nill?

Re:The "Detailed Summary" (1)

javax (598925) | more than 9 years ago | (#11024144)

The FreeBSD ports and the DarwinPorts collection both use md5 verification for the source files they download (and probably many other source distrubution systems also do). If someone hacks one of the mirrors the source is distributed via, he could break into all clients...

Cash Money? (1)

Grendel Drago (41496) | more than 9 years ago | (#11023894)

Was there a cash money prize for someone who could produce a pair of strings with the same MD5 sum? I remember reading or possibly hearing that No One had Ever discovered a hash collision with MD5. Is this the first ever hash collision found?

--grendel drago

Re:Cash Money? (2, Informative)

Meostro (788797) | more than 9 years ago | (#11024037)

No, but I believe this one [iacr.org] is:

XiaoyunWang, Dengguo Feng, Xuejia Lai, and Hongbo Yu
"Collisions for hash functions md4, md5,haval-128 and ripemd"

I studied different hashes.. (1, Funny)

Anonymous Coward | more than 9 years ago | (#11023902)

I found out the darker and moister it was, the more powerful it was. Of course any form of hash was much stronger then regular leaf pot and you could get far more reusable resin when you were done with the pipe.

Re:I studied different hashes.. (1)

gentleolas (609359) | more than 9 years ago | (#11024028)

Actually the strongest (Moonshine, CCup winner 2002/3 in Amst.) is quite dark, but seems dry as a stone....

______ with be harmful/obsolete in the future.. (2, Insightful)

l4m3z0r (799504) | more than 9 years ago | (#11023908)

Really no big surprise, all currently implemented algorithms for security and digital signatures will be rendered useless or easily hackable in the future. The fact that a compromise seems to be soon is the part thats interesting.. however the parent fails to address how serious it is.

For the most part, big deal, all of current security procedures will be harmful ie compromised in the future...

Solution: Use more than one hash algorithm (2, Informative)

NZheretic (23872) | more than 9 years ago | (#11023910)

This kind of attack can be mitigated into non-existance by just using two dissimilar hash algorithms.

Using MD5 with SHA1, or even the older MD2 or MD4 will reduce the probability of creating a compatable binary with the same checksum to virtually zero.

If only one checksum is required then just XOR the resulting checksums from each algorithm.

birthday attack (1)

kippy (416183) | more than 9 years ago | (#11023933)

Can someone explain to me how this is something more complicated than the birthday paradox?

(heh, Wang)

Re:birthday attack (0)

Anonymous Coward | more than 9 years ago | (#11024139)

1) anything that uses MD5 assumes two people cannot have the same birthday.

2) Up till now, they haven't found two people with the same birthday.

3) DRM is built on the assumption that you can trust all people with the same birthday.

4) replace "people" with "files", and "birthday" with "MD5 hash"

Very simple solution - record the size (0)

Anonymous Coward | more than 9 years ago | (#11023938)

Whenever you use a hash, you should always record the size along with the file. That way, any "doppleganger blocks" will stick out like a sore thumb.

Switching to SHA1 (1)

AGTiny (104967) | more than 9 years ago | (#11023952)

I've made the switch to SHA1 for all my website work where I'm storing passwords and session IDs and the like. It's just a simple change from 32 to 40 characters, and a search/replace for:
md5_hex to sha1_hex

Truncated SHA-1 Hash Also Secure (1)

Effugas (2378) | more than 9 years ago | (#11024096)

It's also safe to truncate 40 byte SHA-1 down to 32 byte if need be.

It actually easy to see this (2, Interesting)

Anonymous Coward | more than 9 years ago | (#11023983)

I was a bit suprised recently when I tried to
sort 15,000 jpg image files to remove duplicates.
Since the names varied, and it is not uncommon
to have many images with the same length, it
seemed like a good idea to use md5 hashes.
So I coded it up in python to do the md5 hash,
and then stuffed the file name into a table
keyed by the md5 hash. Big suprise when multiple
different files landed in the same hash.
Some property of jpgs probably reduces the
randomness of the files and increases the
probability of hash collisions.

Re:It actually easy to see this (0)

Anonymous Coward | more than 9 years ago | (#11024042)

You should post the images with hash collisions if you want anyone to believe your story.

Would you consider posting samples? (1)

davidwr (791652) | more than 9 years ago | (#11024128)

If you have the legal right to do so, please some samples, or at least submit them to the cryptographic community for analysis.

Time is of the essence (1)

eneville (745111) | more than 9 years ago | (#11023991)

It's all about the famous IMPS (Infinite Monkey Protocol Suite, RFC2795), if you give the CPU enough data, and enough time, it will eventually crack what ever encryption you are using. To be safe, deploy a stegography technique to hide your data, such as in the meta field of a picture, or within some MIME data on a harmless looking email, but be clever about it (theres a lot of junk in harmless looking SWAP file).

Inherent to any hashing mechanism (0)

FearUncertaintyDoubt (578295) | more than 9 years ago | (#11024005)

Of course. I've run into this type of thing with all sorts of data hash/checksum type of algorithms. If there are a numerically smaller amount of hash values than source data values, it is a mathematical certainty that multiple source data from generating the same hash value.

If you have a 4-byte hash value of a 1000-byte character string, you have 4+ billion possible hash values. But the number of possible 1000-byte strings? What's 256^1000? So, on average, there would be 256^4 / 256^1000 source values that would generate each individual hash value. The only way to avoid this is to have a hash value that is as long as the source value, which eliminates much of the benefits of a hash.

I like the idea someone else posted about using multiple algorithms because it would dramatically reduce the number of possible values that would generate the same hash value using 2 different methods.

Look at Whirlpool (0)

Anonymous Coward | more than 9 years ago | (#11024014)

Look at the Whirlpool hash. Its fast and amenable to hardware implementation.

hash, equals (1)

freality (324306) | more than 9 years ago | (#11024038)

Testing the equality of hash codes is not a replacement for testing equality. It should be used as an optimization to limit the set of items for which equals need be applied. e.g.

Thing search(Thing thingToFind) {
thingList = hashSearch(hash(thingToFind));
for each thing in thingList
if (thing == myThing) // IMPORTANT!
return thing;
return null;

I am holding out for MD6 (0)

Anonymous Coward | more than 9 years ago | (#11024074)

When will it be out?

Any plans for MD64?

MD5 To Be Considered Harmful Someday (1)

roman_mir (125474) | more than 9 years ago | (#11024094)

Think of the children! We cannot allow some MD5 to put our children in the harm's way! Initiate operation AntiMD5, deploy the bombers.

Maybe start consider adopting magnet links (1)

Jugalator (259273) | more than 9 years ago | (#11024155)

Magnet links [sourceforge.net] is an open standard (specification draft [sourceforge.net] ) and an alternative to primarly eDonkey2000, eMule, and Overnet hashes that are based on a precursor to the MD5 algorithm, and I doubt very safe anymore?

I personally think any clients that don't support magnet links should really start consider adopting those -- DC++, most Gnutella clients, and Shareaza already do.

A magnet link can look like something like this:


... see "sha1" above -- the standard doesn't define which hash algorithm to use.

what about stuff that's harmful NOW? (1)

ChipMonk (711367) | more than 9 years ago | (#11024160)

Like, say, Windows?

Yeah, yeah, Obligatory Microsoft Slam. But I don't see any great stampede to isolate and neutralize the threat that this Typhoid Mary of OS's presents to communication stability. Why are we preparing for a threat that barely exists, when we refuse to deal with the present, much greater threat?
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account