What constitutes a good hash anyway?

Journal jd's Journal: What constitutes a good hash anyway? 3

Journal by jd on Friday November 07, 2008 @05:37PM

In light of the NIST complaint that there are so many applicants for their cryptographic hash challenge that a good evaluation cannot be given, I am curious as to whether they have adequately defined the challenge in the first place. If the criteria are too loose, then of course they will get entries that are unsuitable. However, the number of hashes entered do not seem to be significantly more than the number of encryption modes entered in the encryption mode challenge. If this is impossible for them to evaluate well, then maybe that was also, in which case maybe we should take their recommendations over encryption modes with a pinch of salt. If, however, they are confident in the security and performance of their encryption mode selections, what is their real objection in the hashing challenge case?

But another question one must ask is why there are so many applicants for this, when NESSIE (the European version of this challenge) managed just one? Has the mathematics become suddenly easier? Was this challenge better-promoted? (In which case, why did Slashdot only mention it on the day it closed?) Were the Europeans' criteria that much tougher to meet? If so, why did NIST loosen the requirements so much that they were overwhelmed?

These questions, and others, look doomed to not be seriously answered. However, we can take a stab at the criteria and evaluation problem. A strong cryptographic hash must have certain mathematical properties. For example, the distance between any two distinct inputs must be unconnected to the distance between the corresponding outputs. Otherwise, knowing the output for a known input and the output for an unknown input will tell you something about the unknown input, which you don't want. If you have a large enough number of inputs and plot the distance of inputs in relation to the distance in outputs, you should get a completely random scatter-plot. Also, if you take a large enough number of inputs at fixed intervals, the distance between the corresponding outputs should be a uniform distribution. Since you can't reasonably test 2^512 inputs, you can only apply statistical tests on a reasonable subset and see if the probability that you have the expected patterns is within your desired limits. These two tests can be done automatically. Any hash that exhibits a skew that could expose information can then be rejected equally automatically.

This is a trivial example. There will be other tests that can also be applied automatically that can weed out the more obviously flawed hashing algorithms. But this raises an important question. If you can filter out the more problematic entries automatically, why does NIST have a problem with the number of entries per-se? They might legitimately have a problem with the number of GOOD entries, but even then all they need to do is have multiple levels of acceptance and an additional round or two. eg: At the end of human analysis round 2, NIST might qualify all hashes that are successful at that level as "sensitive-grade" with respect to FIPS compliance, so that people can actually start using them, then have a round 3 which produces a pool of 3-4 hashes that are "classified-grade" and a final round to produce the "definitive SHA-3". By adding more rounds, it takes longer, but by producing lower-grade certifications, the extra time needed to perform a thorough cryptanalysis isn't going to impede those who actually use such functions.

(Yes, it means vendors will need to support more functions. Cry me a river. At the current scale of ICs, you can put one hell of a lot of hash functions onto one chip, and have one hell of a lot of instances of each. Software implementations are just as flexible, with many libraries supporting a huge range. Yes, validating will be more expensive, but it won't take any longer if the implementations are orthogonal, as they won't interact. If you can prove that, then one function or a hundred will take about the same time to validate to accepted standards. If the implementations are correctly designed and documented, then proving the design against the theory and then the implementation against the design should be relatively cheap. It's crappy programming styles that make validation expensive, and if you make crappy programming too expensive for commercial vendors, I can't see there being any problems for anyone other than cheap-minded PHBs - and they deserve to have problems.)

This discussion has been archived. No new comments can be posted.