Even for data structures, you generally want to use a cryptographic hash functio...

cheese_factory · on Oct 14, 2013

Well, there are various applications when you don't necessarily need the cryptographic property. Hashing is widely used for large-scale back-end machine learning tasks (LSH, linear models...) where speed and randomness are the only properties that matter.

tibbe · on Oct 14, 2013

Until recently I don't think any major hash table implementation (e.g. not Java, STL, Python, Ruby, etc) used a cryptographically secure hashing algorithm. They're just too slow. Slow enough that you might as well use a tree-based data structure, which doesn't suffer from hash-collision based attacks.

This is changing a bit now when SipHash is available. Unfortunately SipHash still is a bit too slow and forces an uncomfortable trade-off.

aweisberg · on Oct 15, 2013

I would be really surprised to find that a tree is faster then a fast cryptographic hash function like SIP hash. Cache misses to main memory are worth 10s of instructions and that number is going up in many cases. Hash functions can be pretty efficient at allowing the processor to run multiple instructions in parallel.

If you know the tree will be cached and/or the keys you have to hash are large then sure a tree might be a win, but hashing a small key might only be the same as two or three cache misses. Comparing tree nodes is not instruction free either.

I think it is really a case by case sort of thing, but there is no substitute for measuring (and nailing down the comparison to specific hash functions).

mcguire · on Oct 14, 2013

"Unfortunately SipHash still is a bit too slow and forces an uncomfortable trade-off."

2.6x slower than DJB2 in my recent Rust trial. (SipHash is the hashing function in Rust's standard library.)

Xylakant · on Oct 14, 2013

This is why hashes for hashtables and similar datastructures are seeded with a random value upon interpreter startup. This throws off your attack well enough to make it barely feasible with a hashing algorithm that distributes sufficiently random. There was a round of DoS vulnerabilities against PHP, Ruby, Java, ..., all based on that attack vector a while ago, but those were all fixed.

aweisberg · on Oct 15, 2013

A secret seed is not sufficient to prevent DoS vulnerabilities. That is one of the things the creators of SIP hash demonstrated. You can create collisions with MurmurHash3 that work regardless of the seed.

pjscott · on Oct 15, 2013

They've also released code to generate an arbitrarily large number of colliding keys for MurmurHash2 and CityHash64, and one to recover the secret key of Python's randomized hash function. At some point here you're doing cryptography whether you want to or not -- and most of these randomized hash functions are just lousy cryptography.

Xylakant · on Oct 15, 2013

This is right. My wording was incorrect, excuse me. I should have said more correctly "This is why the hashing functions used nowadays include a randomization with a seed determined at interpreter startup." Jruby for example now uses perls hashing algorithm. They used to rely on murmur2, just as cruby did.