Accelerating Standard C++ with GPUs Using stdpar

Originally published at: https://developer.nvidia.com/blog/accelerating-standard-c-with-gpus-using-stdpar/

Historically, accelerating your C++ code with GPUs has not been possible in Standard C++ without using language extensions or additional libraries: CUDA C++ requires the use of host and device attributes on functions and the triple-chevron syntax for GPU kernel launches.OpenACC uses #pragmas to control GPU acceleration.Thrust lets you express parallelism portably but uses language…

This was a great article! It appears that all the discussions and example are based on accelerating standard C++ (code) without any need for CUDA programming but only on one single GPU.

From my work so far on multi-GPU programming, invoking two GPUs and partitioning the data in between always needs some CUDA related code – for instance, binding a MPI rank or a thread to one of the GPUs, or using CUDA Streams for simultaneous use of multiple GPUs and probably other approaches to enable accelerations on multi-GPUs all need selecting the device one way or another which needs CUDA.

All of these are in the opposite direction of “Accelerating Standard C++ with a GPU Using stdpar”, where the goal is to not change the CPU-based code (with no CUDA runtime API, etc.) and compile the code simply with NVC++. So I’m very curious if there any way around this currently, and if not is this something to look forward to in the future? I’d appreciate any insights here.

Yes, multi-GPU stdpar support is on the roadmap.