Nvfortran (v22.11) + OpenACC giving inconsistent results with -O2 and -O3: How to best triangulate?

I’ve noticed that OpenACC + nvfortran is leading to some unexpected artifacts in my simulations for -O3 optimization but not -O2. The code is relatively long to track down the root of this difference by hand; we have dozens of OpenACC kernels. Is there a clean way to bisect where the issue could be coming from?

Right now, we are exploring turning on -O2 plus other options manually like -Munroll and such, but I’m not sure every difference between -O2 and -O3 is flippable via a flag (or is even documented, though some are).

You might try PCAST, either comparing to the CPU, or saving results at -O2 and comparing to -O3. See HPC Compilers User's Guide Version 23.7 for ARM, OpenPower, x86

It is likely the order or operations changing either due to compiler optimizations or unrolling, which might also affect order of summations. But there are other possibilities, including bugs. You can also experiment with the -gpu options to narrow it down.

1 Like

Thanks, @bleback - will report back on what we find most useful and if we suspect bugs.