PGI 18.5 generates incorrect results for O1 up tp O3

Dear OpenACC community,

we have a huge code which was developed using the PGI compiler 17.10 and OpenACC on a P100 cluster. With this setup and the O3 option it generated correct results.

On a different cluster we are now using the the PGI compiler 18.5 and the V100.
However, the program does not generate correct results anymore.

So far we tracked it down to the optimization flag, if we compile and link with -O0 we get also get correct results using the PGI compiler 18.5. But starting from -O1 up to -O3 the results are incorrect.
At same point the also algorithm starts to print out “NaN”.

We compile and link with these flags:

PGIOPTS=-Mcuda=9.0,ptxinfo
PGIOPTS+=-Mpreprocess
PGIOPTS+=-Mlarge_arrays -mcmodel=medium
PGIOPTS+=-ta=tesla:cc70 
PGIOPTS+=-O0  
PGIOPTS+=-mp 
PGIOPTS+=-acc -Minfo=accel -Minfo

We also tried the -fast option for compiling and linking but it still generates wrong results.

Is there a way to to debug it and find out what happens?

Thank you for your help

Does the different cluster have a different processor? You might try setting the processor type to something “older”, like Nehalem for x86, for instance.

But it could be, maybe likely, an optimization bug. The best way to go about it is to create a working version, compiled with -O0, and create a failing version with -O3. Then combine the two sets of objects. For instance, take [a-l].o from one set and [m-z].o from another. Do a binary search to find the failing file or function. Once you have that, we can zero in on where the actual problem lies.

Not in 18.5, but in later compilers we have a new feature called PCAST which would help in situations like this. Compiler-assisted debugging and comparing between a gold version and a test version.

Thank you for your quick replay!
Thank you for the detailed description on how to debug such an error.

We have now tested the code on a DGX2 cluster with the PGI compiler 19.1 and it seems to run correctly.

The P100 cluster has a Broadwell CPU and both other V100 clusters have a Skylake CPU.
We will also install the newest PGI 19.X compiler on the other V100 cluster and then check it again if it runs correctly. Hopefully it works and we don’t need to debug it.

Thank you for your help