CUDA Fortran Kernel Double Precision Array

Hi,

I’m using a NVIDIA GeForce RTX 2080 Ti card.

I have a CUDA Fortran kernel into which a number of DOUBLE PRECISION arrays are passed.
The kernel calculates a value based on these for the given thread index (I), then attempts to store it in an array, R, declared:

DOUBLE PRECISION, INTENT(OUT) :: R(:)

For illustration purposes, the calculation is

(a(I)*b(I)) + c + d + e + f + g + h

When I do

WRITE(*,*) (a(I)*b(I)) + c + d + e + f + g + h

in the kernel, I can see the value of the term correctly is 4.4408920985006262E-016

When I set:

R(I) = (a(I)*b(I)) + c + d + e + f + g + h

then

WRITE(,) R(I)

The value of R(I) is zero.

I know that the values are on the boundaries of machine precision so this must be significant. If I explicitly set R(I) to some small constant, for example

R(I) = 5

Then everything works as expected, so I don’t believe there is anything wrong with the process of calling and returning values from the kernel.

Is there a precision limitation that applies or compiler flags that I could be missing?

Any help here would be greatly appreciated.

Thanks

Hi Sam,

Is there a precision limitation that applies or compiler flags that I could be missing?

FMA is enabled by default in device code, but I highly doubt that if would cause this issue. Though, you can try compiling with “-Mnofma” to disable this.

It doesn’t quite make sense why printing “R(I)” would differ from printing the computation directly. The compiler would need to generate a temp variable to hold the result before printing, which shouldn’t be different than if it were stored to R.

Can you post or send to PGI Customer Service (trs@pgroup.com) a reproducing example? That would help to determine what’s going on.

Thanks,
Mat