I was reading this post: Accelerating Fortran DO CONCURRENT with GPUs and the NVIDIA HPC SDK | NVIDIA Developer Blog and wanted to implement it into a code I am working on. My code relies on reductions, which were not allowed with DO CONCURRENT at the time of the post. I want to find out if NVFORTRAN has come out with a preview implementation of reduction as mentioned in the post.
No, not yet. It is still on our roadmap, but no expected ETA.
It seems it is implemented in HPC SDK 21.3 as I have done some testing.
I compiled with the flag -stdpar=multicore
1421, Generating Multicore code
1421, Loop parallelized across CPU threads
1421, Generating implicit reduction(+:fs2_fs1,fn2_fn1)
Loop not vectorized: non-stride-1 array reference
FMA (fused multiply-add) instruction(s) generated
The compiler can auto-detect reductions in some cases. What’s missing is a reduction clause that you can use to set explicit reductions and what the authors of the Blog were writing about.
Thank you for that clarification. I was misunderstanding what they were saying.