High-Performance MATLAB with GPU Acceleration

Originally published at: https://developer.nvidia.com/blog/high-performance-matlab-gpu-acceleration/

In this post, I will discuss techniques you can use to maximize the performance of your GPU-accelerated MATLAB® code. First I explain how to write MATLAB code which is inherently parallelizable. This technique, known as vectorization, benefits all your code whether or not it uses the GPU. Then I present a family of function wrappers—bsxfun, pagefun, and arrayfun—that…

Vectorisation is important in Matlab whether working on the GPU or CPU if your interested in code execution speed.
Did some quick tests on a CPU only Matlab without the Parallel Computing Toolbox addon:

Taking your code, removing all the GPU references (gpuArray, gather), the vectorisation still give about a 400x speedup on my CPU.

It seems Matlab's arrayfun is different between the standard built-in version and the version included in the Parallel Computing Toolbox. The standard version does not support the automatic expansion of variables (i.e. variables need to be the same size). It is only the Parallel Computing Toolbox that adds that feature. If I manually expand the data and then call the standard version it unfortunately results in about a 100x slow-down on my CPU. Not sure if this is due to the manual expansion or due to using arrayfun only on the CPU.

pagefun is only available in the Parallel Computing Toolbox, so couldn't do any tests on that.

Thanks for an interesting article

That's true about arrayfun and pagefun. arrayfun on the CPU is just a convenience function and provides no benefit over a loop.