Nvidia HPC SDK, Device Offloading and Exclusive/Inclusive Scan

Does the nvidia OpenMP device offloading capability support exclusive/inclusive scan from OpenMP 5.0? Described here: scan Directive.

I’d love to be able to use this in conjunction with #pragma omp loop. I have code that I want to easily switch between CPU/GPU with. Nvc++ recognizes that the innermost loop of my program (which contains the exclusive scan) needs to be parallelized across threads when targeting GPUs, but when targeting CPUs, parallelism is distributed across threads on the outer loop. For the CPU case, this allows the existing plain C++ serial exclusive scan innermost loop to run correctly without race conditions, but for the GPU case, I obviously don’t get the right answer. If I can’t use an OpenMP clause such as “scan” to maintain my identical codebase for CPU/GPU execution, does anyone have any suggestions to explore something else?

Thanks,
Matt

Hi Matt,

Sorry of the late response. Our offices were closed for a U.S. Holiday and I needed to double check with engineering.

Scan isn’t something we support yet and given engineering is focused on bug fixes and performance improvements, new features such as this may be awhile before we add it.

-Mat

No problem. Bummer! Thanks for the info.

Thanks,
Matt

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.