Update on 2018-09-29: updated ska::flat_hash_map and tsl::hopscotch_map to the latest versions. Added absl::flat_hash_map, ska::bytell_hash_map and tsl::robin_map. Updated texts accordingly.
I evaluated multiple hash table libraries nearly 10 years ago. A lot have been changed since then: hash table is officially part of C++, my khash library is about twice as fast, and more advanced algorithms/heuristics are being applied to hash table implementations. Where are we now? Is unordered_map in C++11 the preferred choice? What hash table library should we use? This post aims to find partial answers to these questions.
In this micro-benchmark (source code here), we have N 32-bit integers with ~25% of them are distinct. The task is to find the occurrence of each distinct integer with a hash table. It is inspired by real applications in my routine work. I will show the result first and then discuss all the caveats.
In the figure above, each connected line represents a library. Each line harbors 6 dots, corresponding to N=10,18,26,34,42,50 million, respectively. I used multiple numbers to show the effect of rehashing. The X-axis measures CPU time and Y-axis measures peak memory, including temporary swapping space used for rehashing.
10 years ago, Google’s dense_hash_map was significantly faster than all the peers. It is still among the fastest in this benchmark. When we consider the speed-memory balance, the more sophisticated probing algorithms such as Hopscotch hashing (used by tsl::hopscotch-map), Robin Hood hashing (by ska::flat_hash_map and tsl::robin_map) and Swiss table (by absl::flat_hash_map) are not better. I speculate this is partially because they need to store extra data in each bucket, which cancels some of their advantages under high load. In addition, these advanced hashing methods are better at query. My benchmark always invokes insertions, though 75% of time no new elements are added.
It bugs me that the official unordered_map implementation in GCC-6.3 is that inefficient. In fact, it is slower and uses more memory than SGI’s ancient ext/hash_map and tr1/unordered_map – both of them are still available in GCC. All these libraries use chaining to resolve collisions, which is apparently required by the C++11 spec. It is unfortunate that the C++ standard committee ruled out open addressing. Nearly all the hash table benchmarks indicate open addressing is significantly faster on small keys. As to C libraries, uthash is the most popular, but its performance lags far behind others. When you need a large hash table, ska::*_hash_map and tsl::*_map are the better choices if you prefer C++11 APIs; Google dense_hash and khash remain top options after 10 years.
Additional notes:
- All implementations use the same integer hash function. Switching to the identity hash function has little effect on performance.
- I haven’t tuned the maximum load factor and the growth factor. They may affect the balance between time and space.
- Libraries in the benchmark use different memory allocators. For example, khash uses glibc’s malloc that supports realloc, unordered_map naturally uses std::allocator and Google dense_map/sparsepp are using their own allocators. I suspect that memory allocators play a role in performance. More testing needed.
- There are several other hashtable benchmarks about tsl::*_map, ska::flat_hash_map and ska::bytell_hash_map. These are all good. TommyDS shows a benchmark where it performs the best. That is a bad one because it doesn’t put data into the table (as TommyDS can’t do that). Opic hashtable also has a benchmark. It seems to ignore the effect of rehashing. I thought to evaluate it but couldn’t get it working.
- The source code repo evaluates several more libraries. Their results can be found in “__logs/*.tgz”.
- For demonstration purposes, I have translated khash.h to a C++ single header. Khash implements a fairly naive algorithm. It may not work well with other types of data.
- Benchmark programs were run on a fresh “standard-1” machine from Google Cloud. The results on a local Linux server are a little different:
Hello. I continue to enjoy your hash table updates! I use TommyDS with statically allocated memory. Does TommyHash performance benchmarks hold true when not doing mallocs?
I don’t have a benchmark designed for your use case, so I don’t know. Generally, if the hash table only holds pointers to data stored elsewhere but not data themselves, the performance difference between various hash table libraries will become smaller.
As to tommyds, it uses chaining. In all the other hash table benchmarks I have seen – none of them evaluated tommyds – chaining is generally slower than open addressing. I don’t know why tommyds is different. In addition, tommyds uses old libraries. Khash for example should be ~2x as fast since 2013. Other 3rd-party libraries are old, too. Looking at the source code, I also think it looks a bit weird in that it adds keys as values at the same time, probably because not having a concept of keys vs values, tommyds is doing that anyway. If I were to implement this benchmark, I would do it differently for khash and google dense.
Did you have a look at SharedHashFile on github? Do you have a multi process benchmark to take advantage of SharedHashFile?
It seems to me that SharedHashFile is designed for a very different purposes. It is better compared to other key-value stores.
Do you have some recommended books about C/C++ programming, hash tables, data structures, algorithms and related fields?
Thanks in advance.
Sorry, I don’t have good suggestions. The programming books I read were in a foreign language and were very old. For advanced topics, I mostly learned from others’ blog posts and sometimes published papers in computer science.
[…] Random Numbers and Hash functions: An empirical comparison — article by Peter Kankowski. See also Revisiting hash table performance, or TommyDS: a C library of hashtables and […]
Good info. Thanks for the post.
I’m wondering if radix tree would be better for detecting unique integers of a maximum bit-width given directly indexed look-up tables are cheaper and faster? I suppose it will depend on how the input stream of N integers is distributed.
With a hash table, you can usually find the key in a couple of jumps with one cache miss. I doubt you can achieve the same with a radix tree unless the input is sorted in the right order. In addition, a hash table is dynamic. A dynamic radix tree will be probably even slower.
[…] « Revisiting hash table performance […]
[…] an extra counter, each bucket is larger, which partly cancels the advantage under high load. In my benchmark, Robin Hood hashing is not obviously better on that particular task. A Google’s Abseil […]
Using realloc there is kind of a cheat, I’d love to show you how much so.
I made a hashmap too and while being around 2x slower than khash it uses about 25% less memory. I could easily fit in a realloc into my hashmap implementation, but not all data is relocatable and c++ allocators don’t support relocation therefore it’s kind of cheating and that’s why it’s sadly not there, but let me know if you’re interested!
https://github.com/1ykos/ordered_patch_map
Did you realize this is exactly a C implementation’s advantage? – You are always be able to use the most appropriate interface from the operating system, and integrate it easily.
You can use realloc in c++ code as well, but you’ll be faced with the exact same problems. Not all data is trivially copyable. Whether it be c or c++ .
Very nifty! One suggestion: If you’re comparing single-threaded / non-concurrent-safe implementations against Libcuckoo, we recommend using the locked_table variant, which avoids some of the overhead in the table that’s added to make concurrent modification safe.
I’m unclear under what circumstances open addressing can be used (both in theory and in practice for implementations like your khash). Wikipedia says:
A drawback of all these open addressing schemes is that the number of stored entries cannot exceed the number of slots in the bucket array. In fact, even with good hash functions, their performance dramatically degrades when the load factor grows beyond 0.7 or so. For many applications, these restrictions mandate the use of dynamic resizing, with its attendant costs.
I’m interesting in keeping an existence-only style table of pointer keys (i.e. I only care if the pointer is in the table, not what it’s value is). I want to implement a clone() type function generator using this. Is khash/open addressing suitable?
If you store pointers in a hash table, open addressing still works. Open addressing doesn’t work when you store entire objects in the hash table and reference the objects with pointers elsewhere. This is because the addresses of the objects in the table may change.
Open addressing is great for 3d grid-based calculations where memory usage is a big concern and each cell’s ID is easily translated into single integer. Pathfinding, advanced fluid simulation, visibility culling in 3D graphics, etc…
Nice benchmarks, although I would’ve loved to have GLib’s GHashTable in the lineup too. It’s used extensively in the GNOME desktop stack and system-level services like NetworkManager, so it’s relevant to a whole lot of Linux users.
I performed a slightly different series of hash table benchmarks including khash — maybe those will interest you: https://hpjansson.org/blag/2018/07/24/a-hash-table-re-hash/
The lessons from those allowed me to improve GHashTable to a point where it’s quite competitive for a general-use hash table: https://hpjansson.org/blag/2018/08/29/what-ails-ghashtable/
Good work on khash — it’s simple, compact, frugal and very fast.
See: https://github.com/attractivechaos/udb2/tree/master/_glib. The timing is available in the udb2/__logs/*.tgz
Nice benchmarks, does KHash work well with the key as a string?
khash has string macros too.
It works well, but other libraries are likely to work equal well.
I watched ska::flat_hash_map. They used prime numbers for hash table size. I think using prime numbers is good. But KHash doesn’t use it. Can you tell me the reason, please? Thanks.
[…] modified my earlier benchmark to evaluate deletions. Overall, we feed each hash table library a list of random integers. We […]