**Sorting algorithm**

Given an array of size N, sorting can be done in O(N log(N)) in average. The most frequently used sorting algorithms that can achieve this time complexity are quicksort, heapsort and mergesort. They usually require O(log(N)), O(1) and O(N) working space, respectively (the space complexity of mergesort can be improved at the cost of speed). Most people believe quicksort is the fastest sorting algorithm. However, the fact is quicksort is only fast in terms of the number of swaps. When comparison is expensive, mergesort is faster than quicksort because mergesort uses less comparisons. GNU sort uses mergesort. Replacing it with a quicksort reduces the speed on typical text input. In addition, of the three algorithms, only mergesort is a stable sort. Stability is sometimes useful for a general tool like GNU sort.

The worst-case time complexity of quicksort is O(N^2). In practice, we combine quicksort and heapsort to avoid worst-case performance while retaining the fast average speed. The resulting algorithm is called introsort (introspective sort).

**Implementation**

The two most widely used implementations are glibc qsort and STL (unstable) introsort. Libc qsort calls a function for comparison. For simple comparison, a function call is expensive, which may greatly hurt the efficiency of qsort. STL sort does not have this problem. It is one of the fastest implementations I am aware of. My own implementation of introsort is similar but not as fast as STL introsort.

GNU sort implements a top-down recursive sort. On integer sorting, it is twice slower than introsort (see below). Iterative top-down mergesort is hard to implement. Iterative bottom-up mergesort is much easier. My implementation is a bottom-up one.

Paul Hsieh has also implemented quicksort, heapsort and mergesort. His implementation should be very efficient from what I can tell. To see whether my implementation is good enough, I copied and pasted his codes in my program, and applied “inline” where necessary.

**Comparison**

I designed a small benchmark on sorting 50 million random integers. As comparison is cheap in this case, the number of swaps dominate the performance. I compiled and run the program on three machines: MacIntel (Core2-2G/Mac/g++-4.2.1), LinuxIntel (Xeon-1.86G/Linux/g++-4.1.2) and LinuxAMD (Opteron-2G/Linux/g++-3.4.4). On all the three platforms, the program was compiled with “-O2 -fomit-frame-pointer”. The time (in seconds) spent on sorting is showed in the following table:

Algorithm | MacIntel | LinuxIntel | LinuxAMD | Linux_icc |

STL sort | 7.749 | 8.260 | 7.170 | 8.400 |

STL stable_sort | 9.684 | 11.990 | 10.270 | 10.770 |

libc qsort | 16.579 | 81.190 | 30.490 | 81.120 |

introsort | 7.887 | 8.880 | 7.670 | 9.320 |

iterative mergesort | 10.371 | 12.480 | 10.110 | 10.130 |

binary heapsort | 36.651 | 45.710 | 42.460 | 40.820 |

combsort11 | 18.131 | 19.290 | 19.370 | 19.490 |

isort (func call) | 16.760 | 17.380 | 13.390 | 16.740 |

isort (template func) | 7.902 | 8.800 | 7.690 | 9.010 |

Paul’s heapsort | 34.790 | 43.680 | 40.740 | 39.060 |

Paul’s quicksort | 8.410 | 8.940 | 7.810 | 9.450 |

Paul’s mergesort | 11.103 | 13.390 | 10.680 | 13.030 |

As for the algorithm itself, we can see that introsort is the fastest and heapsort is the slowest. Mergesort is also very fast. Combsort11 is claimed to approach quicksort, but I do not see this in sorting large integer arrays. As for the implementation of quicksort/introsort, STL is the best, with my implementation following very closely. Paul’s implmentation is also very efficient. Libc qsort is clearly slower, which cannot simply attribute to the use of function calls. My implementation with function calls, although slower than without function calls, outperforms libc qsort on both Linux machines. As for the implementation of mergesort, my version has similar performance to STL stable_sort. Note that stable_sort uses buffered recursive mergesort when a temporary array can be allocated. When memory is insufficient, it will use in-place mergesort which is not evaluated here.

**Availability and alternative benchmarks**

My implementation is available here as a single C++ template header file. The program for benchmark is also available. Programs in plain text can be acquired by chopping .html in the two links.

Paul Hsieh’s benchmark is here, including the original source codes. He also discussed how algorithms perform when the initial array is not completely random (I am one of “naive people” in his standard). Please note that in his benchmark, he was sorting an array of size 60,000 for 10000 times, while in my benchmark I more focus on very large arrays. Notably, heapsort approaches introsort on small arrays, but far slower on large arrays. Presumably this is because the bad cache performance of heapsort. Both quicksort and mergesort are very cache efficient.

In addition to Paul’s benchmark, you can also find alternative ones here and here. They seem to be more interested in the theoretical issues rather than efficient practical implementations.

If you search “benchmark sorting algorithms” in google, the first result is this page, which was implemented in D by Stewart Gordon. This benchmark aims to evaluate the performance on small arrays. It also tests the speed when the array is sorted or reverse sorted. However, the implementation is not optimized enough at least for quicksort. Using insertion sort when the array is nearly sorted is always preferred. You can also find this report from google search, but the implementation of quicksort is too naive to be efficient.

**Concluding remarks**

Although in the table introsort performs the best, we may want to use mergesort if we want to perform stable sorting, or the comparison is very expensive. Mergesort is also faster than introsort if the array is nearly sorted. STL sort seems to take particular care in this case, which makes it still fast when the array is sorted.

In common cases when comparison is cheap, introsort is the best choice. Of the various implementations, STL is the fastest. If you do not use STL or you just want to use C, you can use/adapt my implmentation which is very close to STL sort in speed. Do not use libc qsort, especially on Linux. It is not well implemented.

Update

- This website gives severl good implementations of sorting algorithms. I also believe the programmer behind is very capable. Highly recommended.

on April 22, 2009 at 7:01 am |LizaI noticed that this is not the first time you mention the topic. Why have you decided to write about it again?

on July 4, 2009 at 4:24 am |ganeshwhat do you of the syncsort sort utilities. they claim they are the fastest sort on the face of the earth. any validations?

thx.

ganesh

on February 3, 2011 at 9:18 pm |JorisHi A.C. Thanks for making your code available. I recently used it to test compiler optimizations. It was very helpful.

on June 7, 2012 at 5:42 am |A quick note on radix sort « Attractive Chaos[…] see my old post for the information on other algorithms and implementations. You can also clone my klib repository […]

on June 7, 2012 at 2:23 pm |ArunAn informative post, again. Thanks and keep up the good work.