My programming language benchmark: analyses

June 22, 2011 by attractivechaos

With the completion of the sudoku solving benchmark (my last post), my programming language benchmark is also regarded to be completed (still with a few implementations missing). This post gives more context and analyses of the benchmark.

Design

This benchmark is comprised of four tasks:

solving 1000 Sudoku puzzles
multiplying two 1000×1000 matrices
matching URI or URI|Email in a concatenated Linux HowTo file
counting the occurrences of words using a dictionary

The first two tasks focus on evaluating the performance of the ability of translating the language source code into machine code. For these two tasks, most of CPU time is spent on the benchmarking programs. The last two tasks focus on evaluating the efficiency of the companion libraries. For these two tasks, most of CPU time is spent on the library routines. These tasks are relatively simple and cannot be easily hand optimized for better performance.

Results and discussions

The complete results are available here. The following figure shows the CPU time for Sudoku solving and matrix multiplication, both evaluating the language implementation itself (click for a larger figure):

In the plots, a number in red indicates that the corresponding implementation requires explicit compilation; in blue shows that the implementation applies a Just-In-Time compilation (JIT); in black implies the implementation interprets the program but without JIT.

The overall message is the following. Languages compiled into machine code (C and D) are slightly faster than languages compiled into bytecode (Java and C#); compilers tend to be faster than Just-In-Time (JIT) interpreters (LuaJIT, PyPy and V8); JIT interpreters are much faster than the conventional interpreters (Perl, CPython and Ruby). Between compilers, C is still the winner with a thin margin. Between interpreters, LuaJIT and V8 pull ahead. There is little surprising for most language implementations, perhaps except the few with very bad performance.

On the other hand, the comparison of the library performance yields a vastly different picture (again, click to enlarge):

This time, even conventional interpreters may approach or even surpass the optimized C implementation (Perl vs. C for simple regex matching). Some compiled languages at their early ages may perform badly.

Conclusions

The quality of libraries is a critical part of a programming language. This benchmark is one of few clearly separating the performance of the language implementation itself and its companion libraries. While compiled languages are typically one or two orders of magnitude faster than interpreted languages, library performance may be very similar. For algorithms heavily rely on library routines, the choice of programming language does not matter too much. It is likely to come up with a benchmark to beat C/C++ in a certain application.

All the benchmarking programs are distributed under the MIT/X11 license. Please follow the links below for the source code and the complete results:

Source code: https://github.com/attractivechaos/plb
Complete results: http://attractivechaos.github.com/plb/

There are actually more to say about each specific language implementation, but perhaps I’d better leave the controversial part to readers.

Posted in development, thinking, Uncategorized | 47 Comments

47 Responses

on June 22, 2011 at 7:52 am | Reply Vyacheslav Egorov

d8 recently got access to typed arrays so do you think you can use them instead of normal arrays in matmul benchmark?

https://gist.github.com/961337 was roughy 6 times faster on V8 when I checked last time.

Details are available in mine blog post on “benchmark games”: http://blog.mrale.ph/post/5436474765/dangers-of-cross-language-benchmark-games
- on June 22, 2011 at 10:19 pm | Reply attractivechaos
  
  Thank you very much. I have updated to the latest V8-r8384 (the version two months ago did not support float64). The numbers for V8 have all been updated. It seems that the hash table is a little slower in the latest version, but sudoku solving and matrix multiplication are both faster.
  
  Also thank you for the interesting blog post. I appreciate.
  - on June 23, 2011 at 8:13 am Vyacheslav Egorov
    
    Thanks for updating the numbers!
    
    > It seems that the hash table is a little slower in the latest version
    
    Interesting. I will take a look.
on June 22, 2011 at 8:53 am | Reply Chii

I must speak up about the java benchmark for Dict (didn’t have time to look at the others unfortunately).

I think the autoboxing and unboxing of the Integer in the HashMap is killing the performance of java. If you set a higher cache, by setting -Djava.lang.Integer.IntegerCache.high=10000 on the java commandline, or use an integer specific class, like the fastutil http://fastutil.dsi.unimi.it/docs/it/unimi/dsi/fastutil/objects/Object2IntMap.html , i m sure better performance can be achieved. After all, that benchmark is about library performance (not language performance).
- on June 22, 2011 at 8:59 am | Reply Chii
  
  I’ve put the modified code here: https://gist.github.com/1039732
on June 22, 2011 at 10:29 am | Reply Pauline Smoots

I really like this article. I specially like the nice background color and the vertical lines on the borders. I am not so much interested in the technical details of the algorithm as I think it is more important to get the appearance good for ordinary users.
on June 22, 2011 at 8:04 pm | Reply azakai

I am guessing you ran jaegermonkey with just “js”? That’s just the interpreter, which explains why it is as fast as interpreted languages here…

Try it with “js -m -j -p -n” (which are: method jit, trace jit, jit profiling, and type inference). That should be similar in performance to v8.
- on June 22, 2011 at 9:48 pm | Reply attractivechaos
  
  A good catch. Thanks! The JaegerMonkey numbers have all been updated. Note that JaegerMonkey is still slower than V8. You can try the sudoku benchmark in different browsers.
  - on June 30, 2011 at 1:09 am Dave Mandelin
    
    I still can’t replicate the results posted here with the JS shells. I get JM being about 2x slower than V8 on sudoku (run as js -m -j -p sudoku_v2.js < sudoku.txt) and faster on matmul (run as js -m -j -p matmul_v1.js). Is there a different way you run it to make the graphs here?
  - on June 30, 2011 at 1:31 am attractivechaos
    
    Perhaps I should use a more recent version of JaegerMonkey? What is the version of V8 and JaegerMonkey are you using? Could you also try the implementation from another language (e.g. C or LuaJIT) and show the speed for V8, JaegerMonkey and the third language implementation? We need a reference implementation. Thanks.
    
    EDIT: I have just tried to solve the 1000 Sudoku puzzles using Chrome 12.0.742.112 and the shining new Firefox 5. Chrome took 3.8 seconds, while Firefox took 17.7 seconds, similar to the numbers in the bar chart. Is JaegerMonkey from the recent source code tree much faster than the Firefox Javascript engine?
  - on June 30, 2011 at 11:50 pm Dave Mandelin
    
    For sudoku_v1, I get (user time from ‘time’):
    
    C: 52 ms
    js/V8 (SVN rev 8424): 110 ms
    js/SpiderMonkey (hg.mozilla.org/tracemonkey rev d1cdf4295626): 312 ms
    
    Are you using the “JaegerMonkey” repo, by any chance?
- on July 1, 2011 at 1:00 am | Reply attractivechaos
  
  Could you try something like this:
  
  seq 50 | xargs -i cat sudoku.txt | d8 sudoku_v1.js
  
  Perhaps V8 takes longer to warm up? Thanks for you time.
  - on July 2, 2011 at 1:15 am Dave Mandelin
    
    I’m testing on MacOS, where apparently I have neither seq nor xargs -I, but IIUC the point is to concatenate 50 copies of sudoku.txt to form the input. If I do that, I get results more similar to yours: 3.4s for V8 and 15.8s for SpiderMonkey.
    
    What about matmul? Do you run that more times as well, or something like that?
on June 23, 2011 at 8:49 am | Reply fijal

It’s maybe worth noting that shedskin is not Python, it’s a subset of Python (the same way as Cython or RPython are not Python).
- on July 8, 2011 at 6:49 pm | Reply Paul Boddie
  
  If the code compiled by Shedskin runs in, say, CPython or PyPy then it’s Python. Sure, Shedskin doesn’t support “full Python” but if people can compile their Python programs with it, it’s a practical tool worth considering. When people are actually rewriting parts of their projects in Python and dropping the C stuff (since the “common wisdom” was that you had to use C for the high performance bits) to get the benefit of compilation, it’s a gain for Python. It’s arguably no worse if there are some restrictions than some of the bizarre and non-portable tricks some people use to get CPython to run code faster.
  
  Personally, I think people make too much of a fuss about “full Python”. That’s why Python has languished for going on two decades as “that prototyping language” with a bad reputation for performance in some circles.
  - on July 10, 2011 at 5:19 pm fijal
    
    Would you be happy to have a faster C compiler that compiles your programs sometimes wrong? Or that does not support pointer arithmetics?
    
    I think there is qutie a big difference between “full python” and no quite full and for some reason people rather try to stick with full, even if the crippled one is faster.
    
    Anyway, things like “opinions in some circles” are not that easy to argue with. I can probably show some benchmarks or discuss why some parts of language are harder to optimize, but I’m not sure if those circles would listen anyway.
  - on July 10, 2011 at 7:32 pm Paul Boddie
    
    I guess I have to reply to myself to reply to fijal, but anyway…
    
    As far as “full” versus “crippled” (as you call it) Python is concerned, as long as the debate never progresses past the point of complete compatibility with whatever CPython is today, “warts” and all, people will keep jumping through hoops implementing every last little quirk and misfeature of the language while users wait for tools that manage to do things effectively like proper code analysis, occasionally sneaking off to use those nasty non-Pythons because the alternative is to use languages like C, C++, Fortran and even Java.
    
    I had the benefit of being in a seminar for scientific users of Python recently, and I even promoted PyPy there, where the options were more or less laid out as being Python, Cython, NumPy and C, much to the amusement of a colleague who wondered what the benefit was in writing stuff in Python in the first place if you end up rewriting it in C. These people could write in Python and not even touch the exotic stuff, and yes, it’d still be Python as far as they and the compiler are concerned.
    
    The “compiles your programs wrong” argument is a distraction more often than not. If the compiler just said that it couldn’t preserve compatibility with Python semantics for some particular piece of code (and perhaps exited with an error), at least we’d be able to determine the limits of compatibility and agree on what kind of Python is supported. Remember that Python isn’t a static language but has evolved over the years – the stuff I was writing fifteen years ago was still Python – but it’s the essence of the language I’d like to see explored. As far as I’m concerned, stuff like Shedskin helps do just that, and that is more interesting than saying “it’s not full Python, just ignore it”.
  - on July 12, 2011 at 9:32 am fijal
    
    The reply feature is indeed weird…
    
    For one I would be happy to drop some warts of Python language, however, most of the stuff that’s unsupported by shedskin (or RPython for that matter) is genuinely useful, like debugging support and people use it even if they don’t think they need it.
    
    Other cool features include say nice tracebacks – this is also a feature most people *do* use.
    
    For one, I claim PyPy can do better than shedskin (it certainly does better than Cython in many places) without crippling the language and leaving it really useful. The numpy support is also on the way.
    
    Also, I didn’t say the benchmark should ignore shedskin – merely mention it’s not really python.
    
    Cheers,
    fijal
- on July 21, 2011 at 12:59 pm | Reply mark dufour
  
  >however, most of the stuff that’s unsupported by shedskin (or RPython for >that matter) is genuinely useful, like debugging support and people use it >even if they don’t think they need it.
  >Other cool features include say nice tracebacks – this is also a feature most people *do* use.
  
  because they are legal python, shedskin-compatible programs can be developed and debugged using any other python implementation.
  
  shedskin can also generate extension modules for you, so you don’t have to be restricted for ‘complete’ programs.
  
  mark (main author of shedskin).
on June 23, 2011 at 8:51 am | Reply fijal

Also, another thing: would you mind providing a version using array from array module for PyPy? It would be faster, I can provide one if needed.
- on June 23, 2011 at 4:07 pm | Reply Vyacheslav Egorov
  
  It already uses array:
  
  “””
  1) LDC, PyPy, CPython2/3, JS, LuaJIT and IronPython use “v2” and the rest use “matmul_v1.*” in the source code directory.
  “””
  
  https://github.com/attractivechaos/plb/blob/master/matmul/matmul_v2.py#L10
- on June 23, 2011 at 4:08 pm | Reply Vyacheslav Egorov
  
  Oh. You were probably talking about Sudoku. Sorry for the confusion 🙂
on June 23, 2011 at 2:45 pm | Reply fijal

Oh and btw, PyPy 1.5 is faster than PyPy 1.4.1 and PyPy nightly is even faster: http://buildbot.pypy.org/nightly/trunk/

on June 24, 2011 at 4:02 am | Reply

Yogthos

here’s some of the problems in Scala if you want to try do a comparison

	object Matmul {

	def matgen(n: Integer): Array[Array[Double]] = {
	var a = new Array[Array[Double]](n, n)
	val tmp = 1. / n / n
	Range(0, n) map ((i) =>
	Range(0, n) map ((j) =>
	a(i)(j) = tmp * (i – j) * (i + j)))
	return a
	}

	def matmul(a: Array[Array[Double]], b: Array[Array[Double]]): Array[Array[Double]] = {
	val m = a.length
	val n = a(0).length
	val p = b(0).length

	var x = new Array[Array[Double]](m, p)
	var c = b.transpose

	Range(0, m) map ((i) =>
	Range(0, p) map ((j) =>
	x(i)(j) = Range(0, n).foldLeft(0.0)((s, k) => s + a(i)(k) * c(j)(k))))
	return x
	}

	def main(args: Array[String]) {
	val n = 100
	var x = matmul(matgen(n), matgen(n));
	println(x(n/2)(n/2))
	}
	}

	///

	import scala.collection.mutable.HashMap

	object Dict {

	def main(args: Array[String]) {
	val h = new HashMap[String,Integer]()
	var max = 0

	for( ln <- io.Source.stdin.getLines ) {
	var count = 1
	h.get(ln) match { case Some(x) => {count = x + 1; h.put(ln, count)}
	case None => h.put(ln, count)
	}
	if (count > max) max = count
	}
	println(h.size +"\t"+ max)
	}
	}

	///

	import scala.util.matching.Regex
	import java.util.regex.Pattern

	object Patmatch {

	def main(args: Array[String]) {

	if (args.length < 1) {
	println("Usage: scala Patmatch.scala pattern < in.file")
	return
	}
	val regex = Pattern.compile(args(0))

	for( ln <- io.Source.stdin.getLines ) {
	if (regex.matcher(ln).find) println(ln)
	}
	}
	}

view raw

gistfile1.scala

hosted with ❤ by GitHub

on June 24, 2011 at 4:21 am | Reply attractivechaos

I am busy with some other things. I will surely come back and add Scala. Thanks a lot.

on June 28, 2011 at 4:26 am | Reply Тест производительности скриптов на Python | Записки программиста

[…] https://attractivechaos.wordpress.com/2011/…benchmark-analyses/ – очень интересное сравнение производительности Си, Java, C#, D, Go, Perl, Python, Lua, R, JavaScript и Ruby; […]
on June 29, 2011 at 6:13 pm | Reply Isaac Gouy

Please make the 4 figures easier to read – order the data from least CPU sec to most CPU sec.

“… by the principle of informative changes a simple progression of increasing heights from left to right [widths from top to bottom] is to be preferred over a jagged series if the ordering along the X axis [Y axis] is arbitrary… order the levels along the X axis [Y axis] to produce the visually simplest pattern.”

p76 “Elements of graph design” Stephen M. Kosslyn
- on June 30, 2011 at 1:37 am | Reply attractivechaos
  
  It depends on what you are looking for. Currently implementations are grouped by language. If you sort by numbers, it will be less obvious between implementations of the same language. The benchmark page has a sort-able table. People can still get information there. Also, when you sort the numbers, the position of a implementation will be different in different plots. You cannot easily compare across figures. Of course the best way is to implement a dynamic page with javascript to meet the needs of different people.
  - on June 30, 2011 at 6:10 pm Isaac Gouy
    
    >> It depends on what you are looking for <> between implementations of the same language <> the position of a implementation will be different in different plots. You cannot easily compare across figures. <<
    
    You cannot easily compare across figures now!
    
    The x-scales are different. There are no grid lines linking the plots.
  - on June 30, 2011 at 6:14 pm Isaac Gouy
    
    “It depends on what you are looking for”
    
    No – it depends what you want to show – you have to choose.
    
    “…between implementations of the same language.”
    
    If that’s what you want to show then
    
    – for each language order the data from least CPU sec to most CPU sec
    
    – make a gap between each language, or use alternating background colour, or use a grid line to separate different languages
    
    “… the position of a implementation will be different in different plots. You cannot easily compare across figures.”
    
    You cannot easily compare across figures now!
    
    The x-scales are different. There are no grid lines linking the plots.
  - on June 30, 2011 at 8:08 pm attractivechaos
    
    I would have done those in the first version if not due to this stupid excel.
on June 29, 2011 at 6:34 pm | Reply Isaac Gouy

If you really want to make general claims about compiled language implementations and jit’d language implementations and interpreted language implementations (and languages versus libraries) then please

– normalize the CPU sec measurements from one task relative to one of the measurements to that task

– group the normalized measurements into – compiled, jit’d, interpreted (whatever)

– present a box plot showing the median and interquartile range of normalized measurements for compiled, jit’d, interpreted (whatever) for “languages”

– present a box plot showing the median and interquartile range of normalized measurements for compiled, jit’d, interpreted (whatever) for “libraries”

That way maybe we’ll be able to see if your data supports your conclusions.
- on June 30, 2011 at 2:34 am | Reply attractivechaos
  
  You are right that it is not clear from the plots which implementations require compilation and which do JIT. I wished I could change the label colors to reflect this, but it seems quite complicated with the stupid Excel. I now change the color of the numbers.
  
  There are only four plots. It is not hard for readers to reach my conclusions just by eye-balling. Doing complex statistics is a plus for your benchmark, but is overkilling in my case.
  - on June 30, 2011 at 5:52 pm Isaac Gouy
    
    It is very very hard to see if your conclusions are supported by your data or not supported by your data – using those figures.
    
    “There are only four plots” but the x-scale is different on each plot !
    
    There are formulas in Excel to computer Media and Interquartile range – they are not complex statistics, they are basic descriptive statistics.
  - on June 30, 2011 at 7:57 pm Isaac Gouy
    
    Excel quartile function
    
    http://office.microsoft.com/en-us/excel-help/quartile-inc-function-HP010335696.aspx?CTT=5&origin=HP010335658
    
    http://office.microsoft.com/en-us/excel-help/quartile-exc-function-HA010345346.aspx?CTT=5&origin=HP010335658
  - on June 30, 2011 at 8:09 pm attractivechaos
    
    Only you are complaining. If more readers join you, probably I will make some changes.
  - on July 19, 2011 at 10:56 pm Isaac Gouy
    
    My guess is that 0 readers asked you to make those charts like they are now.
on June 30, 2011 at 4:26 am | Reply wmNaN

I’m a bit dissapointed that you didn’t test dmd, the official compiler for D.

Which version of LuaJIT exactly did you test?
- on June 30, 2011 at 4:32 am | Reply attractivechaos
  
  I have explained elsewhere about dmd: I am using a Linux server I have no control of. The server is several years old while dmd requires a more recent glibc.
  
  For LuaJIT, it is beta7, shown in the data page. Beta6 gives nearly identical results.
on July 8, 2011 at 7:17 pm | Reply Lucian

You’re wrong in your classification of languages. For example, CPython and Java both compile to bytecode, but only Java has a JIT. Most languages with a JIT also compile to bytecode, although there are exceptions (most notably V8).

So you could have three categories:
– statically compiled: C, D, go
– jit-ed: java, C#, pypy, luajit
– interpreted: CPython, Perl, Ruby
on August 11, 2011 at 8:05 pm | Reply EricS

Very informative site — keep up the good work.

Any plans to include any of the following languages: Common Lisp, Scheme, Clojure, Ocaml, Haskell?
on September 25, 2011 at 2:40 am | Reply Mark

Fx9 introduces Mozilla’s type inference project to the javascript engine.
I now get in-browser sudoku solver results only 0.5 seconds slower than chrome.
These are amazing results considering Crankshaft supports much more advanced optimizations than Jeagermonkey + Type inference. It just shows how big of a win Mozilla’s type inference project brings. I cant wait for IonMonkey to be finished which will bring a JIT just as advanced as Crankshaft, but also incorporating Type inference.

BTW, Fx9 with TI seems to be faster than Chrome on array benchmarks, so I would expect the Matrix multiplication bench to be really fast now.
- on September 25, 2011 at 2:41 am | Reply Mark
  
  Type inference is -n in the shell.
  So -m -n
on October 24, 2011 at 11:21 am | Reply Pd

Can u do the benchmarks for cython(with static types) too?
on June 21, 2012 at 5:33 am | Reply Познакомился с языком программирования Go | Записки программиста

[…] и другие свидетельства (один и два) явного превосходства Go в плане производительности не […]
on May 15, 2015 at 2:14 pm | Reply Vadorequest

Awesome. Too bad it lacks PHP benchmark, but still awesome 🙂
on January 2, 2024 at 6:18 pm | Reply Accessed on January 19, 2023 (English) – Shopping Centre

[…] Franconia and the perfect holiday destination for the whole household. The university expects the completion and handover of the brand new campus in 2026. The model new constructing is being constructed on […]

Comments RSS

	humbleoptimizer on Using void* in Generic C Progr…
	Accessed on January… on My programming language benchm…
	Sumit Kumar on Implementing Generic Hash Libr…
	Implementing a HashM… on khash.h
	Implementing a HashM… on khash.h

Attractive Chaos

Just another WordPress.com weblog