When talking about in-memory search tree, we usually think of various binary search trees: red-black tree, AVL tree, treap, splay tree and so on. We do not often think of B-tree, as B-tree is commonly introduced as an on-disk data structure rather than in-memory one. Is B-tree also a good data structure for in-memory ordered dictionary? I used to search for the performance comparison between B-tree and binary search trees, but ended up with nothing useful. It seems that only I am interested in such comparison and so I have to do it by myself.
I found John-Mark Gurney’s B-tree via google search. It is well coded and full of clever ideas. The original version has small memory footprint, but it is not as fast as STL’s red-black tree. I studied this source codes and think I should be able to further optimize it. In the end, I got my kbtree.h macro library. As you can see in my hash table benchmark, the modified version beats STL set while using even smaller memory than the original version. I think I am now at the position to say: at least for some applications, B-tree is a better ordered data structure than most of binary search trees.
The most attractive feature of B-tree is its small memory usage. A binary tree needs at least two pointers for each record, which amounts to 16N on a modern 64-bit systems. A B-tree only needs one pointer. Although in a B-tree each node may not be full, a sufficiently large B-tree should be at least 50% full by definition and in average around 75% full. On a 64-bit system, the extra memory is only 8N/0.75+KN(1/0.75-1)=(10+0.3K)N, where K is the size of a key. In fact we can do even better as we do not need to allocate the null pointers in leaves. The practical memory overhead can be reduced to below (5+0.3K)N (in fact, as the majority of nodes in a B-tree are leaves, the factor 5 should be smaller in practice), far better than a binary search tree. On speed, no binary search tree with just two additional pointers (splay tree and hash treap) can achieve the best performance. We usually need additional information at each node (AVL tree and standard red-black tree) or a random number (treap) to get good performance. B-tree is different. It is even faster than the standard red-black tree while still using (5+0.3K)N extra memory! People should definitely pay more attention to B-tree.
Update: The modified B-tree is available here (HTML) as a single C header file. Example is here. Currently, the APIs are not very friendly but are ready to use. In case you want to give a try. Note that you must make sure the key is absent in the tree before kb_put() and make sure the key is present in the tree before calling kb_del().
Someone has corrected me. STL is a specification, not an implementation. By STL in my blog, I always mean SGI STL, the default STL that comes with GCC.
Over the weekend I have done a more complete benchmark of various libraries on search trees and hash tables. Please read this post if you are interested.
I realize that a lot of red-black tree implementations do not need a parent pointer, although SGI STL’s one uses. My comment below is somewhat wrong.
Update 2: kbtree.h has long been moved here, along with khash.h and my other 1- or 2-file libraries.
cool find. I had been a bit shy of red black trees due to their apparent code complexity, but I did notice a couple of recent articles by Sedgewick where he proposes a much simpler implementation which should improve that.
Its a good read in any case –
Click to access LLRB.pdf
gord
Thank you for directing me to the left-leaning red-black tree. I first know LLRB from this website:
http://www.canonware.com/~ttt/
Here Jason gives an implementation. In his implementation (very lengthy!), LLRB is more memory efficient (using two pointers only rather than three plus a color) but is slower than the standard RB on ins/del operations.
However, in the paper you showed to me, LLBR is much easier to implement. Perhaps I will port his Java codes to C/C++ someday to see how it is compared with other RB implementations. Many thanks. This is really helpful.
-ac
Hi,
Where is your benchmark result?
Thanks.
Why does no one talk about AVL trees?
no one uses AVL tree
Is B-tree also a good data structure for in-memory ordered dictionary? Thats precisely what I have to ask…
Wow, that code is the most horrible thing I’ve ever seen. Do you think you have enough macros in there?
I don’t think you need parent pointers for red-black or splay trees, really. Splay trees turn out to be dirt simple to implement in practice, so you should play with those, too.
I wonder if the improved performance of B-trees vs. standard BST’s has to do with better cache behavior, because you’re iterating more things in a sequence. I never thought of it as a cache-oblivious data structure, but that kind of makes sense.
http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
A strongly-typed API constructed exclusively using the preprocessor? Ugh! You know C++ templates exist for this exact purpose, right?
@AI
The benchmark is here, showed with hash table comparisons. I will separate them out when I have time.
@josh
People have done a lot of comparisons between AVL tree and red-black tree. My impression is they have quite similar practical performance. And AVL is a binary tree. It has to use two pointers at least. So my argument largely stands for AVL tree, too. I will also pick up some good AVL tree implementations in my benchmark in future.
@Sebastain
I admit that the the code itself looks terrible at the first sight, but to the best of my knowledge, this might be the best way to achieve generic programming in C while maintaining the best performance. If you have better ideas, I am more than happy to learn. In addition, when I am used to such macros, I do not feel too much difficulty in reading, writing, debugging or maintaining them.
@Reid
I know the standard splay tree only use two pointers, but in practice it is not as fast as the standard red-black tree. I have a library with splay tree and RB tree, written by the same programmer. RB tree is obviously faster for random input. As for red-black tree, I was saying we need parent pointer for “the standard red-black tree”. I am aware that some variants (such as left-leaning red-black tree) do not need a parent pointer, but the red-black tree in the majority of textbooks requires one additional pointer. I call them standard. Furthermore, from what I learn these days, no red-black tree using two pointers is as fast as the red-black tree implemented in STL which use three pointers.
I also believe B-tree tends to be more cache efficient. Another possible reason that B-tree is efficient is that you do not need to re-balance a B-tree so frequently as other binary trees, as is stated in wiki.
@Stuart
I used to write such things (a hash table for example) in C++ and I think I am not bad at that. However, I found the only piece of C++ code in my C project is always related to C++ template headers, which makes me feel uneasy.
What’s more important, C++ programmers have STL. but most C programmers have to achieve generic programming at the cost of efficiency. I want to change this. Although the .h source code is hard to read, the API does not look so bad. What do you think? Do you have better ideas? I am still exploring the methods.
What about modifying a tree data structure in such a manner that it becomes cache oblivious? That should perform well!
Thanks person and Reid for directing me to the cache-oblivious algorithms. I have just spent a few hours on what it is and what resources are available.
I could vaguely see what cache-oblivious algorithms are and the advantages of them, but I do not how to implement one really. I a bit doubt B-tree is not cache-oblivious, though.
Cache-oblivioius algorithms are pretty limited to the academic world. And my understanding of this world is people here care more about theoretical performance rather than practical. They try to reduce the number of theoretical cache misses at cost of adding more operations. The resultant program might not be faster in practice. I was also trying to find actual code on cache-oblivious search tree and the only code I can find claims that their cache-oblivious red-black tree is far slower than SGI STL’s implementation. I think I will stop here unless someone could give me source code and so I can benchmark it. It is really a pain to implement one by myself while getting a worse result.
how does this perform vs a judy array. http://judy.sourceforge.net/
I meant to try Judy, but I did not know how to use Judy in 5 minutes and I gave up. Well, I should have tried further.
Although I have not tried, I tend to believe it may have a better performance than my B-tree implementatin. You can have a look at this page, where a Judy hash table is compared with a traditional hash table. It seems to me that conclusion is on random input, Judy is comparable to hash table. Sean’s hash table has similar speed to my khash.h and in my hash table benchmark, khash.h is times faster than kbtree.h with slightly larger memory footprint. Putting all these together, I think Judy is probably faster than my kbtree.h.
I will try to do an experiment.
In KBtree.h, where does one associate a value with a key similar to your hash? For example, you a key of str and data to associate with the key of (struct *x). I don’t see anyplace to associate data with keys.
@MikeJ
You can put the key and value in one struct:
typedef struct { int key, val; } cell_t;
and
#define cell_cmp(a, b) (((a).key > (b).key) – ((a).key < (b).key)
thus the B-tree is only built for cell_t::key and you can save other information in val.
In fact, STL hash_map is also implemented in a similar way.
So for a container keeping tags (strings or longs) and some pointer to a real struct of data with ability to find, getnext, add, delete items in the container khash or kbtree? I realize my mileage may vary just curious your thoughts between the two.
What’s more important, C++ programmers have STL. but most C programmers have to achieve generic programming at the cost of efficiency. I want to change this. Although the .h source code is hard to read, the API does not look so bad. What do you think? Do you have better ideas? I am still exploring the methods.
http://www.thebootcd.com
@alohaeverafter
You may read my other posts about efficient generic programming in C. For generic C libraries, GDSL/Glib uses void*, which may have efficiency issue. Sglib and Sean’s Tool Box use macros, although sglib does not use the best algorithm at least for rb-tree. For generic RB-tree, Niels Provos’ tree library is recommended due to its performance and stability. For generic hash table libraries in C, I recommend my own implementation.
STL is the work of several phenomenal programmers who have considerable influence over the C++ programming. It is difficult for a common/unknown programmer to write something widely accepted by the C community. In addition, good C programmers, like me, love to reinvent the wheel. They do not rely too much on existing libraries as much as C++ programmers do.
Hi attractivechaos – I liked the picture in your blog – where is it from?
here is a small change that one may do and get a lot for it: replace that division by 2 with the right shift.
yeah, yeah, I know, the compiler will optimize that for you… only it doesn’t. and rightfully so, since the expression is signed.
which suggest an alternative: replace the signed expression with an unsigned one.
I have tested both ideas and the results were the same: GCC compiled kbtest runs 10-15% faster – only did the integer data test.
Thanks for the suggestion!
> The most attractive feature of B-tree is its small memory usage
> (…)
> The practical memory overhead can be reduced to below (5+0.3K)N
That’s an affirmation i’m still trying to understand.
B-Tree typically cost much more than BST in term of memory requirement, due to multiple pointers per elements.
Or maybe you imply that an additional pointer is necessary to point to the key, and is therefore excluded from the (5+0.3K)N formula ?
I’m using BST in “implicit” structures, where the location of node is enough to retrieve the key. In this case, BST cost is strictly 8N (with 32 bits pointers).
Please see my this post. In practice, my implementation uses much less memory especially on 64-bit machines. I am not sure why you say B-tree uses more pointers. It just uses one pointer per element (including empty elements). A B-tree is about 75% full in average (50% full in the worst case). This is already better than BSTs which always require two pointers. Furthermore, in B-tree you only need pointers for internal nodes. Thus less than 1 pointer per element. For typical uses, BSTs cannot beat B-tree in memory. At least I am not aware of a BST implementation that is more memory efficient.
Thanks for answer.
I believe i understand where are the differences lie, and will try to explain them, just to make sure i make no false statement :
I’m using BST in a way which ensures that no data is moved nor pointed to. The position of the node is enough to determine the object (within which lies the key).
This situation happens with string searches for example. But also with tables of objects, or with dedicated objects scattered with a BST node appended.
As a consequence, the cost of BST is *strictly* 2 pointers per node, which happens to be 8 bytes in my implementations.
The drawback is pretty clear : i cannot “position” my node where i want to. Especially, i cannot ensure that two consecutive keys lies next to each other.
To do that, i would need to implement a 3rd pointer, which points to the object (within which lies the key). Then, this would cost 12 bytes per nodes. As i’m talking about millions of nodes, this is a pretty significant, and whichever method allows for memory optimisation is welcomed.
Now getting into B-tree :
With the assumption that, now, the key can be accessed through an object pointer, it is now possible to move nodes around, and for example have them stored in a contiguous area to benefit from “cache” effect.
It also allows to simply “check the next” “node”, which is now a “node element”, without using a pointer between elements. Here are some memory savings.
For example, with a 4-redirection internal node :
! ! ! !
Now, here i’m getting 3+4=7 pointers, in order to store 3 keys, and access to 4 sorted buckets if what i’m looking for is not present into this level.
On a leaf node, otherwise, i could use the full space to store 7 keys (assuming i’m keeping the first position for some kind of header, for exemple to distinguish between internal and leaves).
Therefore :
=> since the tree nodes increase by N(nb elements), the vast majority of nodes should be leaves, which only costs one pointer per key (4 Bytes) + inefficiency (including header).
With N big enough to consider header negligible, and with simple construction rules to ensure leaves are pretty well populated (such as guaranteeing a minimum of 50% fill-rate), inefficiency can only amount to a smaller portion of area allocated.
I don’t know how much it could be, but if we say something like a 30% inefficiency, it still means an average 5.2Bytes per key, which is still good.
==> Interal nodes are less efficient : they store approximately twice less keys per node, due to pointers to other nodes.
Inefficiency also do apply, maybe with an equivalent rate, i do not know. Intuitively, it seems easier to ensure internal nodes are properly filled.
So let’s say that Internal node cost is in the range of 10Bytes per key.
==> So what is the final cost ? as said earlier, the majority of nodes should be leaves; the bigger N, the more share for leaved.
So obviously, the final trade-off depends on N. But it should be closer from 5.2 (leaves) than from 10 (internal).
So let’s settle for something like 6 bytes per key.
This is still better than 8 bytes per key, with usual BST, even though within these 6 bytes, for of them are used by a pointer to key which is not existant in my BST implementation.
The key to this result seems to be the differenciation between internal and leaf nodes, which allows to get rid of unused points into leaves. There is no such distinction in classical BST.
Seems like a winner. But, with just figures from the top of my head, i could use wrong order of magnitude, and result in optimistics or pessimistics results, so nothing can beat a practical implementation measurement.
Something you did and provided here.
Don’t hesitate at correcting me if i’m doing wrong statements, i’m in a learning phase right now, and cannot claim fully understanding all tree-related techniques.
Best Regards
humm, apparently my example got destroyed by the parser.
nevertheless, i hope text explanation around it were good enought to understand.
[…] Blog post arguing B-tree may be a better data structure than a binary search tree. […]
links to header and example generate ‘this site is frozen …’ !
see https://code.google.com/archive/p/cpp-btree/ and comparisons