N Ways to SAXPY: Demonstrating the Breadth of GPU Programming Options

Originally published at: https://developer.nvidia.com/blog/n-ways-to-saxpy-demonstrating-the-breadth-of-gpu-programming-options/

Back in 2012, NVIDIAN Mark Harris wrote Six Ways to Saxpy, demonstrating how to perform the SAXPY operation on a GPU in multiple ways, using different languages and libraries. Since then, programming paradigms have evolved and so has the NVIDIA HPC SDK. In this post, I demonstrate five ways to implement a simple SAXPY computation…

Six Ways to SAXPY gets even more expansive when we consider all the ways one can now program NVIDIA GPUs. Since 2012 we’ve seen great work from the NVIDIA HPC SDK team and other open source projects to make NVIDIA GPUs easier to program for. With the latest NVC++ compiler, we even have stdpar acceleration, which was demonstrated in more detail in blogs like Accelerating Standard C++ with GPUs Using stdpar, Accelerating Fortran DO CONCURRENT with GPUs and the NVIDIA HPC SDK, and Accelerating Python on GPUs with nvc++ and Cython
There are also many improvements that have been made to parts of the SDK like Thrust, cuBLAS, OpenACC to improve performance, allow better use of Unified Memory, etc.
Furthermore, there have been more open source projects that allow one to program for NVIDIA GPUs like cuPy, Numba, Tensorflow, Pytorch and more are always showing up thanks to the NVIDIA Compiler SDK
Thanks for reading my blog and leave a comment if you have any questions or letting me know how you program for NVIDIA GPUs.