OpenACC C++ support


I have a question about OpenACC support for C++. I have some code with a “trivial” level of parallelism in the sense that the main loop can be spread across a large number of independent threads. The trouble is that the body of the main loop consists of OO C++ code with a large number of (nested) method calls.

Is there any hope that OpenACC can do something with this code to automatically create CUDA kernels? In principle we could get a good speedup if this was possible, but at the moment I don’t think there is support for this?

Thanks in advance for your advice.

Hi waseemk,

PGI has put a lot of effort in supporting true OO C++ code in OpenACC. If you happened to be attending NVIDIA’s GTC conference this week, I’ll be presenting a talk on this (3/19/15 at 9am room 210D).

We also just published an PGInsider article about OpenACC and C++ ( as well as CUDA Unified Memory (

The PGI compiler will also automatically create device code for C++ methods provided the definition of the method is visible during compilation (such as in the header file). If not visible (such as in a separately compiled source file), then you just need to manually add the OpenACC “routine” directive to the method’s definition and declaration.

While PGI and OpenACC have improved C++ support, there are still several areas left before we can have full support. Nested classes with dynamic data members is problematic in that there is no automatic support for building and maintaining coherency with the host copy. While long, this series of post has several example on how to do this: copy(movement) of user defined objects to the gpu in OpenACC

We did just add beta support for CUDA Unified Memory which does a lot to solving this issue. However, it is vendor specific and not a good general solution.

Other areas are lack of support for passing function pointers and virtual functions, STL container classes, and exception handling.

Hope this helps,