When talking about in-memory search tree, we usually think of various binary search trees: red-black tree, AVL tree, treap, splay tree and so on. We do not often think of B-tree, as B-tree is commonly introduced as an on-disk data structure rather than in-memory one. Is B-tree also a good data structure for in-memory ordered dictionary? I used to search for the performance comparison between B-tree and binary search trees, but ended up with nothing useful. It seems that only I am interested in such comparison and so I have to do it by myself.
I found John-Mark Gurney’s B-tree via google search. It is well coded and full of clever ideas. The original version has small memory footprint, but it is not as fast as STL’s red-black tree. I studied this source codes and think I should be able to further optimize it. In the end, I got my kbtree.h macro library. As you can see in my hash table benchmark, the modified version beats STL set while using even smaller memory than the original version. I think I am now at the position to say: at least for some applications, B-tree is a better ordered data structure than most of binary search trees.
The most attractive feature of B-tree is its small memory usage. A binary tree needs at least two pointers for each record, which amounts to 16N on a modern 64-bit systems. A B-tree only needs one pointer. Although in a B-tree each node may not be full, a sufficiently large B-tree should be at least 50% full by definition and in average around 75% full. On a 64-bit system, the extra memory is only 8N/0.75+KN(1/0.75-1)=(10+0.3K)N, where K is the size of a key. In fact we can do even better as we do not need to allocate the null pointers in leaves. The practical memory overhead can be reduced to below (5+0.3K)N (in fact, as the majority of nodes in a B-tree are leaves, the factor 5 should be smaller in practice), far better than a binary search tree. On speed, no binary search tree with just two additional pointers (splay tree and hash treap) can achieve the best performance. We usually need additional information at each node (AVL tree and standard red-black tree) or a random number (treap) to get good performance. B-tree is different. It is even faster than the standard red-black tree while still using (5+0.3K)N extra memory! People should definitely pay more attention to B-tree.
Update: The modified B-tree is available here (HTML) as a single C header file. Example is here. Currently, the APIs are not very friendly but are ready to use. In case you want to give a try. Note that you must make sure the key is absent in the tree before kb_put() and make sure the key is present in the tree before calling kb_del().
Someone has corrected me. STL is a specification, not an implementation. By STL in my blog, I always mean SGI STL, the default STL that comes with GCC.
Over the weekend I have done a more complete benchmark of various libraries on search trees and hash tables. Please read this post if you are interested.
I realize that a lot of red-black tree implementations do not need a parent pointer, although SGI STL’s one uses. My comment below is somewhat wrong.
cool find. I had been a bit shy of red black trees due to their apparent code complexity, but I did notice a couple of recent articles by Sedgewick where he proposes a much simpler implementation which should improve that.
Its a good read in any case -
http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf
gord
Thank you for directing me to the left-leaning red-black tree. I first know LLRB from this website:
http://www.canonware.com/~ttt/
Here Jason gives an implementation. In his implementation (very lengthy!), LLRB is more memory efficient (using two pointers only rather than three plus a color) but is slower than the standard RB on ins/del operations.
However, in the paper you showed to me, LLBR is much easier to implement. Perhaps I will port his Java codes to C/C++ someday to see how it is compared with other RB implementations. Many thanks. This is really helpful.
-ac
Hi,
Where is your benchmark result?
Thanks.
Why does no one talk about AVL trees?
Is B-tree also a good data structure for in-memory ordered dictionary? Thats precisely what I have to ask…
Wow, that code is the most horrible thing I’ve ever seen. Do you think you have enough macros in there?
I don’t think you need parent pointers for red-black or splay trees, really. Splay trees turn out to be dirt simple to implement in practice, so you should play with those, too.
I wonder if the improved performance of B-trees vs. standard BST’s has to do with better cache behavior, because you’re iterating more things in a sequence. I never thought of it as a cache-oblivious data structure, but that kind of makes sense.
http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
A strongly-typed API constructed exclusively using the preprocessor? Ugh! You know C++ templates exist for this exact purpose, right?
@AI
The benchmark is here, showed with hash table comparisons. I will separate them out when I have time.
@josh
People have done a lot of comparisons between AVL tree and red-black tree. My impression is they have quite similar practical performance. And AVL is a binary tree. It has to use two pointers at least. So my argument largely stands for AVL tree, too. I will also pick up some good AVL tree implementations in my benchmark in future.
@Sebastain
I admit that the the code itself looks terrible at the first sight, but to the best of my knowledge, this might be the best way to achieve generic programming in C while maintaining the best performance. If you have better ideas, I am more than happy to learn. In addition, when I am used to such macros, I do not feel too much difficulty in reading, writing, debugging or maintaining them.
@Reid
I know the standard splay tree only use two pointers, but in practice it is not as fast as the standard red-black tree. I have a library with splay tree and RB tree, written by the same programmer. RB tree is obviously faster for random input. As for red-black tree, I was saying we need parent pointer for “the standard red-black tree”. I am aware that some variants (such as left-leaning red-black tree) do not need a parent pointer, but the red-black tree in the majority of textbooks requires one additional pointer. I call them standard. Furthermore, from what I learn these days, no red-black tree using two pointers is as fast as the red-black tree implemented in STL which use three pointers.
I also believe B-tree tends to be more cache efficient. Another possible reason that B-tree is efficient is that you do not need to re-balance a B-tree so frequently as other binary trees, as is stated in wiki.
@Stuart
I used to write such things (a hash table for example) in C++ and I think I am not bad at that. However, I found the only piece of C++ code in my C project is always related to C++ template headers, which makes me feel uneasy.
What’s more important, C++ programmers have STL. but most C programmers have to achieve generic programming at the cost of efficiency. I want to change this. Although the .h source code is hard to read, the API does not look so bad. What do you think? Do you have better ideas? I am still exploring the methods.
What about modifying a tree data structure in such a manner that it becomes cache oblivious? That should perform well!
Thanks person and Reid for directing me to the cache-oblivious algorithms. I have just spent a few hours on what it is and what resources are available.
I could vaguely see what cache-oblivious algorithms are and the advantages of them, but I do not how to implement one really. I a bit doubt B-tree is not cache-oblivious, though.
Cache-oblivioius algorithms are pretty limited to the academic world. And my understanding of this world is people here care more about theoretical performance rather than practical. They try to reduce the number of theoretical cache misses at cost of adding more operations. The resultant program might not be faster in practice. I was also trying to find actual code on cache-oblivious search tree and the only code I can find claims that their cache-oblivious red-black tree is far slower than SGI STL’s implementation. I think I will stop here unless someone could give me source code and so I can benchmark it. It is really a pain to implement one by myself while getting a worse result.
how does this perform vs a judy array. http://judy.sourceforge.net/
I meant to try Judy, but I did not know how to use Judy in 5 minutes and I gave up. Well, I should have tried further.
Although I have not tried, I tend to believe it may have a better performance than my B-tree implementatin. You can have a look at this page, where a Judy hash table is compared with a traditional hash table. It seems to me that conclusion is on random input, Judy is comparable to hash table. Sean’s hash table has similar speed to my khash.h and in my hash table benchmark, khash.h is times faster than kbtree.h with slightly larger memory footprint. Putting all these together, I think Judy is probably faster than my kbtree.h.
I will try to do an experiment.