Mon, 17 Oct 2016 09:36:12 +0000
Thank you for your work! It helped me a lot when I needed to choose a space-efficient data structure for a project of mine. I'd be interested in using your kbtree implementation, provided that I had to port it to C++. Under what license do you allow it to be used?
Fri, 14 Oct 2016 13:45:43 +0000
I do recomend including also implementation with instruction reordering(eliminates cache-misisng) and openmp directives for parallization
Mon, 29 Aug 2016 14:19:18 +0000
For GotoBLAS, from which OpenBLAS was forked, you might want to read the paper https://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf
Mon, 29 Aug 2016 08:56:59 +0000
This is one very interesting ecample where you could use some multy threading in order to acchieve faster multiplication.
Sun, 28 Aug 2016 22:44:40 +0000
What about the new gather-scatter vector intrinsics, esp. the 512 bit ones? Is this included in SSE? I guess not, as it needs AVX or Knights Landing/Phi.
Sun, 28 Aug 2016 21:49:09 +0000
I like Eigen a lot. It can use Intel MKL backend, and that has good performances besides matrix multiplication.
Sun, 28 Aug 2016 20:17:28 +0000
Have a look at the "matrix multiplication algorithm" wiki page and you will get some hints. I guess they are faster mostly because they are better at minimizing cache misses by splitting and reordering the computation block by block. As to other possible explanations – I have intentionally disabled multithreading; although the linux server supports AVX, gcc doesn't and I explicitly tells OpenBLAS not to use AVX.
Sun, 28 Aug 2016 19:56:45 +0000
A great post as usual! Thanks. Do you have any theories why OpenBLAS and Eigen is so much better? Are they using threading or maybe AVX?
Fri, 26 Aug 2016 21:28:48 +0000
Great article! Hi, just wanted to mention sparsepp, the updated version of Google's sparse_hash_map/set, which is significantly faster. see https://github.com/greg7mdp/sparsepp.
Tue, 05 Apr 2016 11:17:36 +0000
Well, if developers do good job, they would be able to create very fast containers that work very fast, but they don't….
Using asm, mt, and some other tchniques…
