Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, the cryptographic hash functions are not slow by design. This is a nasty misunderstanding which leads to stuff like MD5(password) in PHP [not the PHK MD5 crypt() algorithm used in Unix systems in the 1990s, this was literally MD5(password) and it's a bad idea, do not do]

The cryptographic hash functions such as the SHA-2 family resist construction of a second pre-image, given N and H(N) there's no way for me to make M such that H(M) == H(N) but M != N that'll be faster than brute force trying all possible values for M. So far everything we have which achieves this is markedly slower than say FNV-1 which would once have been the most likely hashing algorithm for a hash table.

The one way password hashes, some of which are based on the cryptographic hashes, are slow by design. You should not use passwords, but on HN this seems like a lost cause so, assuming you insist on using passwords these algorithms are specifically what you need. Fighting about which one to use ("oh no, PBKDF2 isn't memory hard, blah blah blah") is also a bottomless hole of HN nonsnse, so, fine, pick whichever of them you're certain is the only correct one.

But we do ideally want some cryptographic properties for a hash to be used in a hash table and your instinct that just using a random factor and hoping won't work was correct. You need a correctly designed function, which is what SipHash is. What you want is some keyed function with a key K such that if I know N and H(K,N) but not K, I can't guess an M such that H(K,M) = H(K,N) but M != N except by brute force.

> Many programming languages do not use open addressing and prefer chaining (Java, C#, Golang, std::unordered_map in C++) for simplicity

This was true in Golang but today the map in Go is a modified Swiss Table which of course delivers improved performance. We've seen why Java is stuck where it is. The C++ std::unordered_map is known to have poor performance, C++ programmers have many alternatives [including of course Swiss Tables], such as a really nice modern offering from Boost.

I'm actually not sure about the guts of the C# hash tables at all, it's a complicated beast full of conditionally defined elements because it's reused by CLR internals as well as provided as a type for us users. It's not a conventional design of any sort, there seems to have been (which is common for Microsoft) some degree of Not Invented Here - just make it up as we go along. It's storing pre-calculated hashes with each key->value pair, it's tracking how many collisions it has, I dunno, I wouldn't do this, don't copy it without seeing hard numbers for why this is good.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: