With the completion of the sudoku solving benchmark (my last post), my programming language benchmark is also regarded to be completed (still with a few implementations missing). This post gives more context and analyses of the benchmark.
Design
This benchmark is comprised of four tasks:
- solving 1000 Sudoku puzzles
- multiplying two 1000×1000 matrices
- matching URI or URI|Email in a concatenated Linux HowTo file
- counting the occurrences of words using a dictionary
The first two tasks focus on evaluating the performance of the ability of translating the language source code into machine code. For these two tasks, most of CPU time is spent on the benchmarking programs. The last two tasks focus on evaluating the efficiency of the companion libraries. For these two tasks, most of CPU time is spent on the library routines. These tasks are relatively simple and cannot be easily hand optimized for better performance.
Results and discussions
The complete results are available here. The following figure shows the CPU time for Sudoku solving and matrix multiplication, both evaluating the language implementation itself (click for a larger figure):
In the plots, a number in red indicates that the corresponding implementation requires explicit compilation; in blue shows that the implementation applies a Just-In-Time compilation (JIT); in black implies the implementation interprets the program but without JIT.
The overall message is the following. Languages compiled into machine code (C and D) are slightly faster than languages compiled into bytecode (Java and C#); compilers tend to be faster than Just-In-Time (JIT) interpreters (LuaJIT, PyPy and V8); JIT interpreters are much faster than the conventional interpreters (Perl, CPython and Ruby). Between compilers, C is still the winner with a thin margin. Between interpreters, LuaJIT and V8 pull ahead. There is little surprising for most language implementations, perhaps except the few with very bad performance.
On the other hand, the comparison of the library performance yields a vastly different picture (again, click to enlarge):
This time, even conventional interpreters may approach or even surpass the optimized C implementation (Perl vs. C for simple regex matching). Some compiled languages at their early ages may perform badly.
Conclusions
The quality of libraries is a critical part of a programming language. This benchmark is one of few clearly separating the performance of the language implementation itself and its companion libraries. While compiled languages are typically one or two orders of magnitude faster than interpreted languages, library performance may be very similar. For algorithms heavily rely on library routines, the choice of programming language does not matter too much. It is likely to come up with a benchmark to beat C/C++ in a certain application.
All the benchmarking programs are distributed under the MIT/X11 license. Please follow the links below for the source code and the complete results:
- Source code: https://github.com/attractivechaos/plb
- Complete results: http://attractivechaos.github.com/plb/
There are actually more to say about each specific language implementation, but perhaps I’d better leave the controversial part to readers.
d8 recently got access to typed arrays so do you think you can use them instead of normal arrays in matmul benchmark?
https://gist.github.com/961337 was roughy 6 times faster on V8 when I checked last time.
Details are available in mine blog post on “benchmark games”: http://blog.mrale.ph/post/5436474765/dangers-of-cross-language-benchmark-games
Thank you very much. I have updated to the latest V8-r8384 (the version two months ago did not support float64). The numbers for V8 have all been updated. It seems that the hash table is a little slower in the latest version, but sudoku solving and matrix multiplication are both faster.
Also thank you for the interesting blog post. I appreciate.
Thanks for updating the numbers!
> It seems that the hash table is a little slower in the latest version
Interesting. I will take a look.
I must speak up about the java benchmark for Dict (didn’t have time to look at the others unfortunately).
I think the autoboxing and unboxing of the Integer in the HashMap is killing the performance of java. If you set a higher cache, by setting -Djava.lang.Integer.IntegerCache.high=10000 on the java commandline, or use an integer specific class, like the fastutil http://fastutil.dsi.unimi.it/docs/it/unimi/dsi/fastutil/objects/Object2IntMap.html , i m sure better performance can be achieved. After all, that benchmark is about library performance (not language performance).
I’ve put the modified code here: https://gist.github.com/1039732
I really like this article. I specially like the nice background color and the vertical lines on the borders. I am not so much interested in the technical details of the algorithm as I think it is more important to get the appearance good for ordinary users.
I am guessing you ran jaegermonkey with just “js”? That’s just the interpreter, which explains why it is as fast as interpreted languages here…
Try it with “js -m -j -p -n” (which are: method jit, trace jit, jit profiling, and type inference). That should be similar in performance to v8.
A good catch. Thanks! The JaegerMonkey numbers have all been updated. Note that JaegerMonkey is still slower than V8. You can try the sudoku benchmark in different browsers.
I still can’t replicate the results posted here with the JS shells. I get JM being about 2x slower than V8 on sudoku (run as js -m -j -p sudoku_v2.js < sudoku.txt) and faster on matmul (run as js -m -j -p matmul_v1.js). Is there a different way you run it to make the graphs here?
Perhaps I should use a more recent version of JaegerMonkey? What is the version of V8 and JaegerMonkey are you using? Could you also try the implementation from another language (e.g. C or LuaJIT) and show the speed for V8, JaegerMonkey and the third language implementation? We need a reference implementation. Thanks.
EDIT: I have just tried to solve the 1000 Sudoku puzzles using Chrome 12.0.742.112 and the shining new Firefox 5. Chrome took 3.8 seconds, while Firefox took 17.7 seconds, similar to the numbers in the bar chart. Is JaegerMonkey from the recent source code tree much faster than the Firefox Javascript engine?
For sudoku_v1, I get (user time from ‘time’):
C: 52 ms
js/V8 (SVN rev 8424): 110 ms
js/SpiderMonkey (hg.mozilla.org/tracemonkey rev d1cdf4295626): 312 ms
Are you using the “JaegerMonkey” repo, by any chance?
Could you try something like this:
seq 50 | xargs -i cat sudoku.txt | d8 sudoku_v1.js
Perhaps V8 takes longer to warm up? Thanks for you time.
I’m testing on MacOS, where apparently I have neither seq nor xargs -I, but IIUC the point is to concatenate 50 copies of sudoku.txt to form the input. If I do that, I get results more similar to yours: 3.4s for V8 and 15.8s for SpiderMonkey.
What about matmul? Do you run that more times as well, or something like that?
It’s maybe worth noting that shedskin is not Python, it’s a subset of Python (the same way as Cython or RPython are not Python).
If the code compiled by Shedskin runs in, say, CPython or PyPy then it’s Python. Sure, Shedskin doesn’t support “full Python” but if people can compile their Python programs with it, it’s a practical tool worth considering. When people are actually rewriting parts of their projects in Python and dropping the C stuff (since the “common wisdom” was that you had to use C for the high performance bits) to get the benefit of compilation, it’s a gain for Python. It’s arguably no worse if there are some restrictions than some of the bizarre and non-portable tricks some people use to get CPython to run code faster.
Personally, I think people make too much of a fuss about “full Python”. That’s why Python has languished for going on two decades as “that prototyping language” with a bad reputation for performance in some circles.
Would you be happy to have a faster C compiler that compiles your programs sometimes wrong? Or that does not support pointer arithmetics?
I think there is qutie a big difference between “full python” and no quite full and for some reason people rather try to stick with full, even if the crippled one is faster.
Anyway, things like “opinions in some circles” are not that easy to argue with. I can probably show some benchmarks or discuss why some parts of language are harder to optimize, but I’m not sure if those circles would listen anyway.
I guess I have to reply to myself to reply to fijal, but anyway…
As far as “full” versus “crippled” (as you call it) Python is concerned, as long as the debate never progresses past the point of complete compatibility with whatever CPython is today, “warts” and all, people will keep jumping through hoops implementing every last little quirk and misfeature of the language while users wait for tools that manage to do things effectively like proper code analysis, occasionally sneaking off to use those nasty non-Pythons because the alternative is to use languages like C, C++, Fortran and even Java.
I had the benefit of being in a seminar for scientific users of Python recently, and I even promoted PyPy there, where the options were more or less laid out as being Python, Cython, NumPy and C, much to the amusement of a colleague who wondered what the benefit was in writing stuff in Python in the first place if you end up rewriting it in C. These people could write in Python and not even touch the exotic stuff, and yes, it’d still be Python as far as they and the compiler are concerned.
The “compiles your programs wrong” argument is a distraction more often than not. If the compiler just said that it couldn’t preserve compatibility with Python semantics for some particular piece of code (and perhaps exited with an error), at least we’d be able to determine the limits of compatibility and agree on what kind of Python is supported. Remember that Python isn’t a static language but has evolved over the years – the stuff I was writing fifteen years ago was still Python – but it’s the essence of the language I’d like to see explored. As far as I’m concerned, stuff like Shedskin helps do just that, and that is more interesting than saying “it’s not full Python, just ignore it”.
The reply feature is indeed weird…
For one I would be happy to drop some warts of Python language, however, most of the stuff that’s unsupported by shedskin (or RPython for that matter) is genuinely useful, like debugging support and people use it even if they don’t think they need it.
Other cool features include say nice tracebacks – this is also a feature most people *do* use.
For one, I claim PyPy can do better than shedskin (it certainly does better than Cython in many places) without crippling the language and leaving it really useful. The numpy support is also on the way.
Also, I didn’t say the benchmark should ignore shedskin – merely mention it’s not really python.
Cheers,
fijal
>however, most of the stuff that’s unsupported by shedskin (or RPython for >that matter) is genuinely useful, like debugging support and people use it >even if they don’t think they need it.
>Other cool features include say nice tracebacks – this is also a feature most people *do* use.
because they are legal python, shedskin-compatible programs can be developed and debugged using any other python implementation.
shedskin can also generate extension modules for you, so you don’t have to be restricted for ‘complete’ programs.
mark (main author of shedskin).
Also, another thing: would you mind providing a version using array from array module for PyPy? It would be faster, I can provide one if needed.
It already uses array:
“””
1) LDC, PyPy, CPython2/3, JS, LuaJIT and IronPython use “v2” and the rest use “matmul_v1.*” in the source code directory.
“””
https://github.com/attractivechaos/plb/blob/master/matmul/matmul_v2.py#L10
Oh. You were probably talking about Sudoku. Sorry for the confusion 🙂
Oh and btw, PyPy 1.5 is faster than PyPy 1.4.1 and PyPy nightly is even faster: http://buildbot.pypy.org/nightly/trunk/
here’s some of the problems in Scala if you want to try do a comparison
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
gistfile1.scala
hosted with ❤ by GitHub
I am busy with some other things. I will surely come back and add Scala. Thanks a lot.
[…] https://attractivechaos.wordpress.com/2011/…benchmark-analyses/ – очень интересное сравнение производительности Си, Java, C#, D, Go, Perl, Python, Lua, R, JavaScript и Ruby; […]
Please make the 4 figures easier to read – order the data from least CPU sec to most CPU sec.
“… by the principle of informative changes a simple progression of increasing heights from left to right [widths from top to bottom] is to be preferred over a jagged series if the ordering along the X axis [Y axis] is arbitrary… order the levels along the X axis [Y axis] to produce the visually simplest pattern.”
p76 “Elements of graph design” Stephen M. Kosslyn
It depends on what you are looking for. Currently implementations are grouped by language. If you sort by numbers, it will be less obvious between implementations of the same language. The benchmark page has a sort-able table. People can still get information there. Also, when you sort the numbers, the position of a implementation will be different in different plots. You cannot easily compare across figures. Of course the best way is to implement a dynamic page with javascript to meet the needs of different people.
>> It depends on what you are looking for <> between implementations of the same language <> the position of a implementation will be different in different plots. You cannot easily compare across figures. <<
You cannot easily compare across figures now!
The x-scales are different. There are no grid lines linking the plots.
“It depends on what you are looking for”
No – it depends what you want to show – you have to choose.
“…between implementations of the same language.”
If that’s what you want to show then
– for each language order the data from least CPU sec to most CPU sec
– make a gap between each language, or use alternating background colour, or use a grid line to separate different languages
“… the position of a implementation will be different in different plots. You cannot easily compare across figures.”
You cannot easily compare across figures now!
The x-scales are different. There are no grid lines linking the plots.
I would have done those in the first version if not due to this stupid excel.
If you really want to make general claims about compiled language implementations and jit’d language implementations and interpreted language implementations (and languages versus libraries) then please
– normalize the CPU sec measurements from one task relative to one of the measurements to that task
– group the normalized measurements into – compiled, jit’d, interpreted (whatever)
– present a box plot showing the median and interquartile range of normalized measurements for compiled, jit’d, interpreted (whatever) for “languages”
– present a box plot showing the median and interquartile range of normalized measurements for compiled, jit’d, interpreted (whatever) for “libraries”
That way maybe we’ll be able to see if your data supports your conclusions.
You are right that it is not clear from the plots which implementations require compilation and which do JIT. I wished I could change the label colors to reflect this, but it seems quite complicated with the stupid Excel. I now change the color of the numbers.
There are only four plots. It is not hard for readers to reach my conclusions just by eye-balling. Doing complex statistics is a plus for your benchmark, but is overkilling in my case.
It is very very hard to see if your conclusions are supported by your data or not supported by your data – using those figures.
“There are only four plots” but the x-scale is different on each plot !
There are formulas in Excel to computer Media and Interquartile range – they are not complex statistics, they are basic descriptive statistics.
Excel quartile function
http://office.microsoft.com/en-us/excel-help/quartile-inc-function-HP010335696.aspx?CTT=5&origin=HP010335658
http://office.microsoft.com/en-us/excel-help/quartile-exc-function-HA010345346.aspx?CTT=5&origin=HP010335658
Only you are complaining. If more readers join you, probably I will make some changes.
My guess is that 0 readers asked you to make those charts like they are now.
I’m a bit dissapointed that you didn’t test dmd, the official compiler for D.
Which version of LuaJIT exactly did you test?
I have explained elsewhere about dmd: I am using a Linux server I have no control of. The server is several years old while dmd requires a more recent glibc.
For LuaJIT, it is beta7, shown in the data page. Beta6 gives nearly identical results.
You’re wrong in your classification of languages. For example, CPython and Java both compile to bytecode, but only Java has a JIT. Most languages with a JIT also compile to bytecode, although there are exceptions (most notably V8).
So you could have three categories:
– statically compiled: C, D, go
– jit-ed: java, C#, pypy, luajit
– interpreted: CPython, Perl, Ruby
Very informative site — keep up the good work.
Any plans to include any of the following languages: Common Lisp, Scheme, Clojure, Ocaml, Haskell?
Fx9 introduces Mozilla’s type inference project to the javascript engine.
I now get in-browser sudoku solver results only 0.5 seconds slower than chrome.
These are amazing results considering Crankshaft supports much more advanced optimizations than Jeagermonkey + Type inference. It just shows how big of a win Mozilla’s type inference project brings. I cant wait for IonMonkey to be finished which will bring a JIT just as advanced as Crankshaft, but also incorporating Type inference.
BTW, Fx9 with TI seems to be faster than Chrome on array benchmarks, so I would expect the Matrix multiplication bench to be really fast now.
Type inference is -n in the shell.
So -m -n
Can u do the benchmarks for cython(with static types) too?
[…] и другие свидетельства (один и два) явного превосходства Go в плане производительности не […]
Awesome. Too bad it lacks PHP benchmark, but still awesome 🙂
[…] Franconia and the perfect holiday destination for the whole household. The university expects the completion and handover of the brand new campus in 2026. The model new constructing is being constructed on […]