A quick post. I implemented matrix multiplication in Dart. It takes Dart 12 seconds to multiply two 500×500 matrices. In contrast, LuaJIT does the same job in less than 0.2 seconds. Perl takes 26 seconds. This means that Dart fails to JIT the critical loop even though I am trying to use explicit typing. Dart is not quite there yet to match the performance of V8 and LuaJIT. The source code is appended. It looks almost the same as C++.
mat_transpose(a)
{
int m = a.length, n = a[0].length; // m rows and n cols
var b = new List(n);
for (int j = 0; j < n; ++j) b[j] = new List<double>(m);
for (int i = 0; i < m; ++i)
for (int j = 0; j < n; ++j)
b[j][i] = a[i][j];
return b;
}
mat_mul(a, b)
{
int m = a.length, n = a[0].length, s = b.length, t = b[0].length;
if (n != s) return null;
var x = new List(m), c = mat_transpose(b);
for (int i = 0; i < m; ++i) {
x[i] = new List<double>(t);
for (int j = 0; j < t; ++j) {
double sum = 0;
var ai = a[i], cj = c[j];
for (int k = 0; k < n; ++k) sum += ai[k] * cj[k];
x[i][j] = sum;
}
}
return x;
}
mat_gen(int n)
{
var a = new List(n);
double t = 1.0 / n / n;
for (int i = 0; i < n; ++i) {
a[i] = new List<double>(n);
for (int j = 0; j < n; ++j)
a[i][j] = t * (i - j) * (i + j);
}
return a;
}
main()
{
int n = 500;
var a = mat_gen(n);
var b = mat_gen(n);
var c = mat_mul(a, b);
print(c[n~/2][n~/2]);
}
I guess this comes totally unexpected. It would be interesting to submit the case to dart vm developers…
Dart is just immature. I heard that Dart is developed by the same group who write V8. If this is true, I am sure Dart will have similar or even better performance than V8 in the long run.
Yes, they are the same developers that work on V8.
In this presentation (http://www.youtube.com/watch?v=bsGgfUreyZw) they are very open with the fact that V8 beats dart in most of the benchmarks. In 2011 they just wanted to ship the language and the environment and this year they are going to focus on speed and robustness.
Take a look at http://www.infoq.com/presentations/Performance-V8-Dart and fast forward to 00:33:18. You will be pleasantly surprised 🙂 Slides are available at https://github.com/strangeloop/strangeloop2012/blob/master/slides/sessions/Bak-PushingTheLimitsOfWebBrowser.pdf
Thanks a lot. That is actually a little surprising. Dart can do complex jobs at a speed comparable to V8, but it cannot JIT a simple problem like matrix multiplication. Dart only needs to JIT one inner loop, but it fails to do that. I have tried on both Mac and Linux and tried different ways to write the inner loop. No difference.
@attractivechaos: I left two other comments below that explain what is happening. Can you see them? Is there any reason that prevents you from approving them?
Hi again,
[it’s a bit ironic that I am again looking at the performance of the same benchmark but now on a different VM :-)]
Here are some observations about your benchmark.
1) currently Dart VM primarily focuses on the performance of a warmed up code and does not implement OSR [On Stack Replacement] (because it’s mostly artificial benchmarks that need it and it’s a complicated piece of code). So to see performance of the optimized code you need to warm up VM a bit (e.g. by multiplying two middle size matrices);
2) Dart VM does not use declared variable types to drive optimizations at the moment. In the production mode variable types are ignored entirely. Furthermore: your program is incorrectly typed 🙂 If you run it through the VM in checked mode (–enable-type-checks), through dart_analyzer or open it in the Editor you will see that variable sum is initialized with integer value not with double. (see https://gist.github.com/3922691#file_output.txt for output from the VM and analyzer).
3) Related to #2: sum is a variable of a mixed type (integer and double) so VM does not unbox it to retain type information correctly. For the improved performance it should be initialized with 0.0 which is double zero.
4) List is a funny guy. In production mode it can contain anything from null to instances of random classes so you end up always boxing/unboxing doubles. Dart VM does not use NaN tagging like LuaJIT2 and neither it tries to unbox backing store automatically like V8 does because Dart is all about stating programmer’s intent explicitly and getting predictable performance. In this case it’s better to use Float64List from scalarlist package. With the newest VM operations over Float64List are unboxed. [but SDK build does not have such VM yet, so you’ll have to build one manually or wait until SDK picks this VM version].
With the changes described above (warm up, correct sum initialization, Float64List instead of List) on the ToT Dart VM (ia32 build) I have a factor of 24 improvement (from 7465ms to 300ms).
And there are still things that we can and will improve in the Dart VM so stay tuned for better perf.
Oh and here is a link to the modified version of the benchmark: https://gist.github.com/3922691
Sorry. My fault. I have not seen your comments in time. I would of course approve if I had seen it in the first place. Thanks!
@mraleph is there an handy link for downloading the weekly or nightly build you’ve tested the script with? Thanks!
@Paolo: the official VM works fine with Vyacheslav’s program. It indeed runs much faster.
@Paolo: continious build of SDK is available at http://gsdview.appspot.com/dart-editor-archive-continuous/${revision}
@attractivechaos: VM from the SDK on the dartlang.org currently does not actually inline [] and []= for Float64Array as it was done recently. If you try for example http://gsdview.appspot.com/dart-editor-archive-continuous/13855/ you should see another 3-4 times speedup which should bring it quite close to V8/LJ2 results.
[Thanks for unscreening comments! I suspected that you just missed my comments because I commented too much :-)]
Hi,
Thanks for benchmarking Dart. This post spawned an article highlighting the best practices when benchmarking Dart:
http://www.dartlang.org/articles/profiling/
John