Does the nvidia OpenMP device offloading capability support exclusive/inclusive scan from OpenMP 5.0? Described here: scan Directive.
I’d love to be able to use this in conjunction with #pragma omp loop. I have code that I want to easily switch between CPU/GPU with. Nvc++ recognizes that the innermost loop of my program (which contains the exclusive scan) needs to be parallelized across threads when targeting GPUs, but when targeting CPUs, parallelism is distributed across threads on the outer loop. For the CPU case, this allows the existing plain C++ serial exclusive scan innermost loop to run correctly without race conditions, but for the GPU case, I obviously don’t get the right answer. If I can’t use an OpenMP clause such as “scan” to maintain my identical codebase for CPU/GPU execution, does anyone have any suggestions to explore something else?
Thanks,
Matt