Recently I have put rde::hash_map, which implements an open addressing hash table with linear probing, to my benchmark. It is surprisingly fast. Studying its source codes reveals that it achieves the speed by caching the hash values and by reducing cache misses. Both ways increase the memory footprint.
In most of hash libraries, the hash value is hash(key)%n_buckets, which only contains log2(n_buckets) bits of information. In contrast, rde::hash_map stores the hash(key). Comparison between keys is first performed between hashes and then between keys. As a hash contains 30-bit information in rdestl, distinct keys can rarely have an identical hash, which saves a lot of calls to calculating hashes and will gain a big speedup on complex keys such as strings. However, storing the hash require additional 4 bytes, which can cost more given improperly aligned memory.
As rde::hash_map requires additional 4 bytes in each bucket, it is able to mark a bucket as being empty or deleted with two bits of these 4 bytes, instead of using a flag array as is in my khash or using special keys as is in google dense hash table. As a result, rde::hash_map avoids the cache miss in khash AND the expensive key comparisons in google dense hash table, which makes it faster than both libraries on both integer and string keys. Also, like google dense hash_map, rde::hash_map pack key and value in a struct instead of putting keys and values in two arrays as is in khash. This also saves a cache miss on data retrieval. However, packing key and value may bring memory overhead when the key type and the value type are unaligned in memory. Please see my old post for further discussions.
Interestingly, rde::hash_map chose linear probing. I used to try linear probing in my khash and it deliverse slower speed than double hashing even on random input. Maybe I should try harder?
In conclusion, rde::hash_map achieves high speed at the cost of high memory consumption. It is a good candidate if you do not care too much about memory. My khash library first aims at a library with small memory foot print and then at speed. I would not be surprised to see it is not the fastest hash table library.
Do you have an estimation on how much more memory the hash implementation uses?
Caching the hash value requires 4 bytes in each bucket. In a lot of practical cases, 8 bytes are required due to memory alignment.
The benchmark gives the practical results.