This is a follow-up of my previous post. Here I change the table to several charts. Hope it seems more friendly to readers. You can find the links to these libraries in that table. Their source codes, including my testing code, are available here. You may also want to see my previous posts in the last few days for my interpretation to the results.
On C string (char*) keys, I fail to use JE_rb_old and JE_rb_new to get the correct result on Mac and so they are not showed in the charts. I would really appreciate if someone may give me the correct implementation using these libraries. In addition, tr1_unordered_map uses a lot of memory according to my program. The memory for string keys are faked.
For conveniece, here are some brief descriptions of these libraries (with no order):
- google_dense and google_sparse: google’s sparsehash library. Google_dense is fast but memory hungery while google_sparse is the opposite.
- sgi_hash_map and sgi_map: SGI’s STL that comes with g++-4. The backend of sgi_map is a three-pointer red-black tree.
- tr1::unordered_map: GCC’s TR1 library that comes with g++-4. It implements a hash table.
- rdestl::hash_map: from RDESTL, another implementation of STL.
- uthash: a hash library in C
- JG_btree: John-Mark Gurney’s btree library.
- JE_rb_new, JE_rb_old, JE_trp_hash and JE_trp_prng: Jason Evans’ binary search tree libraries. JE_rb_new implements a left-leaning red-black tree; JE_rb_old a three-pointer red-black tree; both JE_trp_hash and JE_trp_prng implement treaps but with different strategies on randomness.
- libavl_rb, libavl_prb, libavl_avl and libavl_bst: from GNU libavl. They implment a two-pointer red-black tree, a three-pointer red-black tree, an AVL tree and a unbalanced binary search tree, respectively.
- NP_rbtree and NP_splaytree: Niels Provos’ tree library for FreeBSD. A three-pointer red-black tree and a splay tree.
- TN_rbtree: Thomas Niemann’s red-black tree. I ported it to C++.
- sglib_rbtree: from SGLIB. It implements a two-pointer recursive red-black tree (all the other binary search trees are implemented without recursion).
- libavl_avl_cpp, libavl_rb_cpp and libavl_rb_cpp2: incomplete C++ version of libavl (no iterator), ported by me. Libavl_rb_cpp2 further uses the same technique in JE_rb_new to save the color bit. Source codes available in the package.
- khash and kbtree: my hash table and B-tree implementation. kbtree is based on JG_rbtree.




Question to provide clarity:
By putting both mac-intel and linux-intel64 in the same column are you attempting to show that the MI are all lower (memory or time) than the LI or showing total of (MI + LI)? Ie, is the blue line in front of the orange, or is the orange standing on top of the blue?
In the above figures, the orange is standing on top of the blue. In the first figure, for example, the second bar means google_sparse takes ~6 CPU seconds on Mac while ~4 seconds on Linux. And therefore, each bar represents the overall performance on the two systems with similar CPUs (Mac: 2GHz; Linux: 1.86 GHz).
Actually, it would be clearer to put the results in eight, rather than four, charts, but then there would be too many figures.
Perhaps I’m blind, but cannot find your mail anywhere. I’ve modified your benchmark to include my hash_map class, would be glad to send my modified version to you.
Ah, it is hidden in the source codes. You can send the code to attractivechaos at aol dot co dot uk. Thank you.
[...] hash_map included in Attractive Chaos benchmark (+bug fixes). It did quite well. One problem is it consumes memory like crazy, but benchmark works [...]
The code is pretty fast but not very legible. Single letter variables are a killer.
Your focus on these data structures is good, trying to balance speed and size.
If something like ustl were actively maintained I do seriously believe kbtree & khash could be worked into it to fill in some of the missing algorithms.
Thanks for the comment, Brian. I like the philosophy of uSTL. However, what I am aiming at is to make a even smaller library than uSTL. For example, if we want to use a hash table library, just put a single header khash.h in our code. Then we do not need to worry too much about dependency or portability because all the code is there. To me, it is a bit worrying to ask other programmers to have SGI STL/g++-tr1 or even to install a giant boost/qt for a basic container as simple as hash table. khash.h only has ~300 lines of code.
You may get away with that with khash since the standard stl doesn’t have hash_map.
However to make the code most accessible by more people it might be nice to have some similar api…it makes it easier to change out libraries if need be.
I personally haven’t used hashes in ages since I got burnt so badly by QDict back in the old days. It seems I should perhaps re-evaluate the use of hashes, however I definitely need the ordered behavior of a map class still.
It might be interesting to see how the collections you have there fare on an embedded platform like ARM.
Here another c generic library for you test:
http://home.gna.org/gdsl/
Thanks, Guillermo. Currently I only include libraries that do not require to install. I will try to include QT/Boost/glib/gdsl and so on when I get time.
-ac
@Guillermo
I have tried gdsl’s red-black tree. It is quite a standard three-pointer implementation using void* for generic programming. Not surprisingly, it has quite similar performance to libavl_prb. For red-black tree library in C, I would recommend NP_rbtree, which is by all means superior to gdsl_rbtree. NP_rbtree is used in FreeBSD’s kernel. It must be very stable, too.
GDSL’s hash table only allows char* keys. I did not test that as it is not a generic container.
Very interesting article, just what I was looking for, thanks! In the end I decided to go with sgi_hash_map for now, since it already comes with g++-4 and can easily be switched out for a standard STL map (almost identical interface).
By the way, have you considered including boost/unordered_map.hpp in your benchmark as well?
@lars
Thanks for your interest. I have tested QT4 QHash and boost::unordered_map. I commented these two libraries in another post.
If I did everything correctly, boost::unordered_map is a little bit slower than SGI’s hash_map, with almost identical memory footprint. QT4’s QHash is a little bit faster than SGI hash_map but at the cost of more memory.
For a C++ user, I would highly recommend SGI’s hash_map. SGI’s STL is widely used with several leading compilers. It has been in gcc since 2.95.3. And therefore its hash_map is highly portable. On performance, it is close to the best. Although in this benchmark sgi_hash_map is slightly slower than my own implementation, I can (carefully) design another benchmark to make my implementation slower. In terms of stability and easiness, hash_map is also better.
Boost, gcc’s TR1 and QHash are also good candidates, but then you need users to have gcc4 or to install hundreds MB of libraries for a simple thing as hash table. I am always a bit worrying about this.
In the uthash string test, I believe what you want is to change the HASH_ADD_STR line to
HASH_ADD_KEYPTR(hh, h, r->key, strlen(r->key), r);
Also for this data set you should try adding this to the CFLAGS:
-DHASH_FUNCTION=HASH_BER
-Troy
@Troy
Thank you for the hint and sorry that I have not read documentation carefully. Uthash now works properly. I will update the post later.
I just noticed a (new?) first column “boost_hash”? Which library is that, boost.unordered? It seems to give good performance (with little space) for char* keys, but is only plotted for 64bit?
It is boost unordered class. I did not install it on my Mac laptop (it is huge and I am short of disk space) and so the mac result is unavilable.
From what I can tell, g++4::tr1/sgi_hash/boost_unordered all use very similar algorithms. It is no surprising that they achive very similar performance.
Thanks for the clarification.
I think it’s interesting to note that the __gnu_cxx::hash_map (from sgi) is considered deprecated as of gcc v4.3 — the gcc tr1 hash containers are preferred replacements. However, their memory performance seems to be terrible in your tests…!
@lars
I have tested g++::tr1 several times before and only this time it shows this weird memory footprint I could not exlain. I have not looked into the details, though.
In general, in all my tests, g++::tr1 is slightly faster than __gnu_cxx::hash_map. It would be good to use g++::tr1 when we make sure this weird memory is not caused by a bug.
Could you also include the following in your testing?
http://idlebox.net/2007/stx-btree/
wow.. thank you. !
@attractivechaos
I am trying to use the GDSL library for different data structures such as @D array and lists. I have comiled it and it went well but when i tried to buid it then gives me error. I have given the command line arguments for compiler but when i gave command line arguments to linker it says the following:
================================================== ================
Running “/usr/bin/make -f Makefile CONF=Debug” in /home/usman/NetBeansProjects/test
/usr/bin/make -f nbproject/Makefile-Debug.mk SUBPROJECTS= .build-conf
make[1]: Entering directory `/home/usman/NetBeansProjects/test’
/usr/bin/make -f nbproject/Makefile-Debug.mk dist/Debug/GNU-Linux-x86/test
make[2]: Entering directory `/home/usman/NetBeansProjects/test’
mkdir -p dist/Debug/GNU-Linux-x86
gcc -o dist/Debug/GNU-Linux-x86/test build/Debug/GNU-Linux-x86/test.o gdsl-config –libs
gcc: gdsl-config: No such file or directory
make[2]: *** [dist/Debug/GNU-Linux-x86/test] Error 1
make[2]: Leaving directory `/home/usman/NetBeansProjects/test’
make[1]: *** [.build-conf] Error 2
make[1]: Leaving directory `/home/usman/NetBeansProjects/test’
make: *** [.build-impl] Error 2
Build failed. Exit value 2.
===================================
In the usage guidelines for the GDSL the following is stated:
—————————————————————————————–
Usage
=====
Simply include ‘#include ‘ into your source C file.
Include ‘gdsl-config –flags’ to your compilation command.
Include ‘gdsl-config –libs’ to your link command.
Then, you can use any of the functions described in the main GDSL header file:
‘gdsl.h’.
————————————————————————————–
I am using netbeans 6.5 for the development. Please tell me that what could be the problem as i am really stucked in it
also if you could please suggest me some other good IDE for development as the one i am using is good for java but not for C.
Looking forward for your kind response.
regards,
For unix/linux, you should put gdsl-config on the $PATH and compile with something like:
gcc -c `gdsl-config –flags` -g -Wall -O2 -o code.o code.c
and link with:
gcc `gdsl-config –libs` code.o -o code
Or you explicitly put the output of gdsl-config –flags to replace `gdsl-config –flags`. Maybe you have to do so for windows as windows shell does not recognize this “ marks.
What to use is largely subjected to your appetite. I simply use Emacs, but this might be difficult for you. I know people use NetBeans (Java), Eclipse (Java), code::blocks (cross-platform), Dev-C++ (windows), Kdevelopment (Linux/KDE) and Xcode (mac os x). These are all free. I think others rather than me may give you more useful advice.
@Brian
Thank you for the link. STX B+-tree is a very good library. It slightly outperforms my kbtree in terms of speed and comparable in memory. I will show the results later.