Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These are 32-bit hashes. 160-bit hashes with similar characteristics are outrageously less likely to generate collisions. Something like 3.4 X 10^38 less likely if I'm running ColinWright's formula properly.

That difference is computationally extremely significant. It took the SO poster ~9ms to find a collision with, e.g., Murmur. If you mapped those results to the 160-bit hash, finding a collision, even ignoring the added time to compute the larger hash, would take 97 octillion years.



This is not the right way to look at it. Finding a collision on CRC256 (a 256-bit CRC code) is nowhere near as hard as finding a collision on SHA256 (assuming SHA256 is secure). A CRC code is just the remainder modulo a polynomial, so adding any multiple of that polynomial to the string will give you a collision. The probability of a pair of independently and uniformly sampled random strings having the same CRC256 hash is certainly small, but cryptographic hashes make a strong guarantee: even if the pair is not sampled independently or uniformly a collision is unlikely.

It is also worth pointing out that the hash size is not necessarily a measure of security. Very Smooth Hash is a good example of this: VSH has security that depends on the hardness of a problem that is closely related to integer factorization, and produces hashes that are as long as its parameters. You might need 3072 bit parameters for VSH to be secure, and will thus have 3072 bit hashes; but the hardness of finding a collision will be about as hard as brute-forcing a 128 bit keyspace (estimating these things is something of a dark art, and I am not an expert; it might be that VSH requires much larger parameters than RSA for equivalent security).


Of course an application like Bitcoin needs a cryptographic hash. But the parent comment was concerned that a single-digit number of collisions showed up in a couple hundred thousand test cases of a 32-bit hash. A similar count of collisions has a good chance of showing up with a 32-bit cryptographic hash, too. It's the cryptographic properties in concert with the size of the range that make the function secure for its intended use.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: