kernel returns Nan

Dear all,
I’m experiencing something really strange with PGI Accelerator Fortran compiler. In particular I wrote a program to be executed with CUDA Fortran.
This program worked fine on a Linux-based operating system with a Nvidia Tesla M2070 GPU with CUDA 4.2.9 and PGI 12.8.
Now i’m trying to execute the same program on a machine with Ubuntu 14.04.2 LTS operating system, with a Nvidia GeForce 750M GPU, CUDA 5.5 and PGI 15.4.
I compile and link with the flag -Mcuda=cuda5.5 .
CUDA software seems to work properly because I made simple tests written both in CUDA C and CUDA Fortran. The results are correct.
My problem is with the program I described at the beginning. In particular something really strange happens: the kernels return arrays containing correct values mixed to Nan values. This is really strange because these kernels have always worked correctly and I also checked everything twice. No strange operations are made in the kernels, nor the indexes of the threads are out of bounds. I think that a possible reason should be related to libraries and drivers. Can anybody help me? Thanks

Roberto

Hi Roberto,

This is a tough one for me to diagnosis. Your code could be using uninitialized memory or have some other memory error which behaves differently on the two systems. Though, it could also be a change in the compiler version, CUDA version, the host system, or even the device.

If you send a reproducing example to PGI Customer Service (trs@pgroup.com) and ask them to pass it on to me, I can take a look as see what I can determine.

Otherwise, you’ll need to narrow down the component that’s causing the difference, and then diagnose the code to determine where the NaNs are coming from.

  • Mat