top
### Variable length hashes

bogado writes | more than 9 years ago

I propose a variable length hash, this would have an advantage over the current fixed sizes hash functions used today. As I see it the problem with fixed sized hashes is that the larger the size of the file the larger is the probability of a collision. Getting a lager hash for larger files reduces the possibility on this happening by making the number of possible hashes bigger for larger sets. Also since the growth of the hash size is variable, this makes it even harder to find a collision, since the data set would have to match not only the data but also the size of the hash.

*How can we do it?*

I am sure there is a number of ways of creating those variable length hashes, but I will describe here a simple idea I have, that can have holes but for a start I will shoot it here so the cryptographic community could check it out and scrutinize it, and maybe come with new ideas.

The principle is simple, I think that simplicity is very important since the most people that can understand something the more people can question and find problems with it. We start with a known secure hash function, after a predefined number of bits we would freeze a random number of bits of the internal state of the function, all of this would be in function of the internal state it self, so it could be repeated easily. The final hash would be simple all of the frozen bits collected during all stops appended with the usual hash. This approach would guaranty at least the same level of security of the original hash.

Other ways could be arranged so the methd could get somewhat stronger, for instance one could start two diferent hash functions one for the "extra bits" and the other for the finalization. But this would add some computation time.

*Possible problems*

The extra bits taken from the hash could help a attacker figure out the original message, in a password like environment for instance. For this hash to be really secure for cryptography uses the way the extra bits are chosen and how many of them must be carefully studied so that those do not give any tips on how they were chosen.

For this reason I would suggest that the internal state of the hashing function should be divided in two sets of bits, both with the same amount of information attached to them. So if some of the bits are more random then others in this particular hash, those bit should be divided by the two sets so the two final sets have the same amount of information in them. One of this set generates a number and the other generates the bits that would be frozen.

The other problem I can see is that the current protocols can have a fixed space for signatures and hashes. Those protocols should have to be adapted for using those hashes (if this proves to be actually secure). Also one could think in a way to summarize the extra bits and the size in a fixed sized, but this could make the hashes once again weak for larger data.

Also for this to be really secure at a higher size data, the amount of growth of the hash should be approximately linear. But this clearly would be too much since the hash would become too large and would hinder the transfer. So a rate of growth should be carefully chosen so the hashes don't get too big too soon, but they would also give enough security.