Standard parallel C++

Hi, I heard multiple GTC talks about standard parallel C++ effort by Nvidia. Wondering if someone can point some github links (or documents/tutorials) showing code written in standard parallel C++.

If we don’t write CUDA, can standard parallel C++ handles the passing of device pointer between two distant functions? How does it handle it (allocating a device buffer)?
Can standard parallel C++ emulate CUDA streams for asynchronous computation/communication overlapping?

Thanks.

Hi llodds,

I don’t know of any repos offhand but you can start with our Accelerating Standard C++ with GPUs Using stdpar Blog post for the basics.

If we don’t write CUDA, can standard parallel C++ handles the passing of device pointer between two distant functions?

Function calling is supported provided that the function definition is visible at compile time (such as in a template, header file or in the same source file). In which case the compiler will implicitly create a device callable version.

Functions defined in a separate source can’t be used since there’s currently no method to decorate these routines so the know compiler knows to create device versions.

Also, virtual functions and procedure pointers are not currently supported in device code.

I’m not clear on what you mean by passing of a device pointer. The pointer would be passed like any other argument.

How does it handle it (allocating a device buffer)?

It uses CUDA Unified Memory so allocated memory is visible on both the device and host.

Can standard parallel C++ emulate CUDA streams for asynchronous computation/communication overlapping?

No. The APIs for the existing C++ algorithms are inherently synchronous,. My understanding is that the C++ committee is working on better ways of writing asynchronous code in standard C++ via Sender, Receivers and Executors. Making asynchronous versions of the parallel algorithms would happen after that.

Note that C++ STDPAR is more about portability. While it can also give good performance, it’s not intended to replace nor have the same level of functionality and customization as CUDA.

-Mat