Feeds:
Posts

Comparison of Binary Search Trees (BSTs)

Again, this post is a follow-up of this page. Source code is available here.

AVL Tree vs. Red-Black Tree

If you google “avl vs. red-black”, the first returned page will largely answer your question. In summary, the height of an AVL tree is at most ~1.44log(N), lower than the maximum height of a red-black tree, ~2log(N). However, upon insertion, AVL tree may need O(log(N)) operations to rebalance the tree, but red-black tree requires O(1). This means AVL tree is probably faster on searching but slower on insertion. In addition, red-black tree is easier to be implemented as a persistent data structure.

Practical analysis is not so clear as theoretical analysis. I think AVL and red-black may have quite similar overall performance. Firstly, although the maximum possible height of a AVL tree is better, both AVL and red-black trees are probably well balanced without reaching the maximum height frequently. Secondly, although AVL tree has an O(log(N)) loop upon insertion, it is a very tight loop with a small constant, while red-black tree may have a loop of small size with big constant (amortized O(1), though). I am not sure whether red-black tree really wins on insertion.

In Daniel’s benchmark, AVL tree and red-black tree have quite similar speed (in consideration that Daniel implemented AVL tree). In my benchmark, libavl’s red-black tree and AVL tree also deliver similar speed.

On memory, a standard AVL tree requires two pointers and one balancing factor at each node, while a red-black tree can be implemented using two or three pointers plus a color bit (either the balancing factor or the color bit will cost the size of a pointer due to memory alignment). A red-black tree with two pointers is usually slower than a red-black tree with three pointers (libavl_rb_cpp vs. NP_rbtree), but the difference is marginal. In my benchmark, a red-black tree with two pointers has pretty similar performance in comparison to an AVL tree (libavl_rb_cpp vs. libavl_avl_cpp).

In all, my opinion is AVL and red-black are very similar in many aspects. Which to use does not matter too much if you are not implementing a persistent data structure.

Splay Tree vs. AVL/RB Tree

Search on a splay tree may be O(N) if elements are inserted in order. This “feature” alone pushes me away. Although wiki says it is possible to avoid the worst case, I am not aware of a practical implementation. On random input, splay tree is also slower than AVL and red-black trees.

One of the advantages of a splay tree is it only requires two pointers with no additional information. This is helpful when memory is critical, but I would rather use a modified red-black to achieve the same memory footprint. Another advantage of splay tree is it is self-adjusting. I have not tested this in my benchmark.

Treap v. AVL/RB Tree

I do not know much about treap. It seems to me treap does not guarantee O(log(N)) search. Fortunately, in practice the worst O(N) case should almost never happen due to the random behaviour of a treap. In my benchmark, treap is slower than splay tree on random input. Probably it is not the best choice.

Implementations

As to red-black tree, SGI STL’s set/map is really a good one. You can find other C++ implementations (TN_rbtree libavl_rb_cpp) in my benchmark. They are also efficient, but incomplete. Libavl_rb_cpp is adapted from libavl_rb which only uses two pointers. NP_rbtree (FreeBSD’s tree.h) is the best choice for a C programmer.

As to AVL tree, I am not aware of good AVL implementations in either C or C++. Stlavlmap is supposed to be efficient, but I cannot compile it. WK_avl may be a good library, judged from its source code, but it is not easy to use. GM_avl uses recursion and so cannot be efficient. Libavl_avl_cpp is my ported version of libavl_avl. It is efficient but incomplete. You may finish it if you really like to try an AVL tree. BTW, libavl is in fact a very high-quality library, but using void* makes it inferior to others. Someone should improve its API using C macros or C++ template.

Concluding Remarks

For general purpose, red-black tree might still be the best choice. It is among the fastest binary search trees and more easily adapted to a persistent data structure. AVL tree is equally good when persistence is not needed (well, all my applications do not require persistence). At last, do not forget my previous post: in-memory B-tree can be both faster and more light-weighted than red-black/AVL trees.

Update

Sambal told me that on insertion, red-black only guarantees amortized O(1). The worst case can be O(log(N)) with large constant. Then the advantage of red-black tree on insertion is less obvious.

Update the claim about the memory usage of AVL and red-black tree.

20 Responses

1. Why you no prove SGLIB?. It’s a good library
and very fast in my tests.
http://sglib.sourceforge.net/

2. on October 4, 2008 at 3:29 pm | Reply attractivechaos

Thanks a lot, Guillermo. I did not know this library. Now I have added sglib_rbtree to the table. SGLIB is a very good library. It also saves the parent pointer and so has smaller memory footprint. However, sglib_rbtree uses recursion to insert/erase an element, which causes a little overhead on speed.

I did not put sglib’s hash table in my benchmark as apparently it only allows fixed-sized hash table, while all the hash table libraries in the benchmark are dynamic.

3. Why would a RB-tree make a better persistent data structure than an AVL-tree?

As I understand it, any time you modify any node in a persistent tree, you must replace it. This means you must modify its parent to point to the new version. Recursively, you end up replacing the entire ancestry. Any change, therefore will require that O(h) new nodes are created, where h is the height where the change occurred. The only advantage of a RB-tree is that inserts are local and do not work their way up most of the tree. In persistent trees, however, every change always goes all the way to the top.

It seems, with a smaller height, persistent AVL-trees will have faster inserts, use less memory and have faster searches than persistent RB-trees.

4. I think that an AVL is better than a red black in search,insertion or deletion, takes less memory than Rb, so i don’t see the advantages of RB trees and why we continue to use them?

5. on February 22, 2009 at 10:07 am | Reply attractivechaos

@mikou

If you read these figures, you will find AVL/RB has quite similar performance in terms of speed and memory (and actually also in implementation difficulty). I do not think either of them outperforms the other. People perfer RB maybe because theorists and textbooks like RB. In addition, as I pointed out, it is possible to implement RB (libavl_rb_cpp) to use only two pointers with the color bit encoded in one of the pointers, but we can never do this for AVL.

6. i implemented deletion and adjonction for RB AVL and hashcoding to see and make compare between them, i saw that AVL are better than Rb in memory capacity(we don’t use the sentinel) in time because the height is at most =|1| (for rb height >1) so it takes less time to do the 3 operations…

7. on February 27, 2009 at 1:06 pm | Reply attractivechaos

@mikou

Thanks for your interest. Nonetheless, I would suggest you compare your implementation with other implementations such as STL/libavl and so on. You can also find other libraries in the source code package I provide in this page. Note that the fact that your RB is slower than your AVL does not necessarily mean RB is in general worse than AVL; maybe you have not got the best way to implement RB. Of course benchmarking algorithms is actually benchmarking implementations, but you should at least make sure your implementation is comparable to the best before coming to a final conclusion. It would be interesting to see an AVL implementation that could outperform the best RB library. If this is the case, I will surely add your library to the benchmark.

8. Can we say tha hashcoding and avl or rb are independant i mean we can’t compare hashcoding with the two others

9. Not that I’m impressed a lot, but this is a lot more than I expected when I stumpled upon a link on SU telling that the info is quite decent. Thanks.

10. on September 9, 2009 at 11:00 pm | Reply attractivechaos

@mikou

Hash table, avl and rb are different, but they share something in common. They are all dictionary data structures: we put things in it and get them back efficiently. In this sense, they are comparable.

11. [...] Comparison of Binary Search Trees(BSTs) [...]

12. AFAIK insertion in rbtrees takes O(lg n) time, as it first uses a regular BST-Insert which per se is O(lg n) and after that it fixes the rb proprties which is again O(lg n)
source: CLRS p 322

• on December 7, 2011 at 10:05 pm | Reply attractivechaos

See my last edit. I am aware of the mistake. Red-black tree only guarantees amortized O(1) once the insertion point is found. The worse-case time complexity is O(log(N)).

13. I’m not sure where you got the idea that red-black trees are easier to implement as persistent data structures than AVL trees. I think AVL trees are considerably easier to implement as persistent, especially if you need to support deletion. Deletion in a persistent red-black tree is hard.

14. I actually did a similar comparison in a different language: Python. I found that, depending on workload, Splay, AVL or Treap was in first place, and Treap was always either first place or second place. Workloads included mostly read, mostly write, and balanced read/write. Red-Black trees turned out so slow compared to the others, that I eliminated them from the final graphs to keep the graphs readable for the datastructures that were performing better. The comparison with commentary, code and graphs is here: http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/ What I’d heard in the past is that Red-Black trees are nice because they don’t vary much in performance from one operation to the next (low standard deviation, OK average), but for good average performance with a high (poor) standard deviation you’re better off with a treap. My tests bore this out. HTH

• on April 30, 2013 at 6:47 pm | Reply attractivechaos

You may have a look at my other post on the evaluation of trees, including RB, AVL, splay and treap. RB and AVL have similar performance and both are clearly better than splay tree and treap. These are random input. I have seen another article, I forgot where, that RB is generally faster than treap.

I would attribute the bad performance of RB in your evaluation to python. Treap is simpler than RB and requires fewer actions. I believe RB has better tree height than treap. In C, the additional actions in RB will not take much more time as long as they do not add more cache misses. The lower tree height will lead to better performance. In python, these additional actions may add more time, which outweighs the tree balance. The simpler treap becomes better.

I have seen a couple of comparisons between RB and AVL trees (I forgot the links again). My impression is they have very similar practical performance in C. On the tree data structure, my long favorite is B-tree, but probably implementing B-tree in python efficiently will be more difficult.

• I’m not sure it’s “just python” this time. ^_^ FYI: https://en.wikipedia.org/wiki/Binary_search_tree#Performance_comparisons

Also, please note that some of my graphs are on PyPy, which JIT’s similar to Java, while others are on Jython; Python != CPython anymore. Furthermore, the Red-Black tree seemed to do reasonably well up to about 500,000 elements – it was beyond that point that things got bad – here’s the graph with Red Black Trees still in it: http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/red-black-tree/Datastructure%20comparison.pdf

Are you empirically comparing large collections of elements in your trees? Say, with a million elements and more? Or are you saying “This is what /should/ happen…”

This kind of supports what you’re saying though: AA Trees were faster than Red-Black Trees in my comparison, which kind of surprised me, since an AA Tree is a simplified Red-Black Tree. But the difference doesn’t appear to be just a small constant.

If anything, I’d guess that the Red-Black Tree implementation I obtained had a problem; it’s hard to argue that 3 or 4 very different Python runtimes (CPython [23], PyPy, Jython) all had the same problem.

• on April 30, 2013 at 9:04 pm attractivechaos

I would recommend you to read my blog post (the link in my previous comment is not obvious: http://attractivechaos.wordpress.com/2008/10/07/another-look-at-my-old-benchmark/). I am inserting 5 million random numbers generated by “(unsigned)(drand48()*5000000/4)*271828183u”. If a number is not in the tree, add it; if it is present, remove it. There are 625k numbers left in the tree in the end.

I think your results are unexpected. I have never seen AVL and RB have such huge difference. In addition, even if RB is inefficient, it should not be more than 2 orders of magnitude slower. Have you checked the maximum height of your RB tree? You said the python version was modified from Thomas Niemann’s well known code. Have you checked if the max tree height from TN’s and your implementations are similar (in principle they should be identical if you literally ported the code)? Have you tried another RB implementation in python?

The paper linked from the wiki looks more convincing, though it disagrees with my evaluation. One possibility is that the treap implementation I was using is not optimal. I did not implement it myself. I took libraries from this page: http://t-t-travails.blogspot.com/2008/07/treaps-versus-red-black-trees.html, where you can also find a comparison showing treap has worse performance. Although I didn’t implement RB, either, I know what a proper implementation in C should look like. And the fact that all RB implementations had similar performance in my benchmark suggests that they are alright.

• As I said before, it could be that the RB Tree I obtained (and ported to Python 3 from Python 2 – it runs on both now) has problems. I may look into that at some point.

15. I’ve redone my comparison in Python, this time with a new Red-Black tree implementation. Red-Black trees do look comparable to AVL trees now, but they’re still pretty lacklustre: http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/just-trees/ It’s still the case that AVL trees, Splay trees and Treaps took all the 1st places, and it’s still the case that treaps were consistently either 1st place or 2nd place. The comparison is still for varied workloads (3: mostly get, 50-50 get and put, mostly put) and varied python implementations (4: CPython 2.x, CPython 3.x, PyPy and Jython). I only went to 2 million operations this time (previously went to 4 million), because my 64 bit system is giving CPU errors.