I’m using a NVIDIA GeForce RTX 2080 Ti card.
I have a CUDA Fortran kernel into which a number of DOUBLE PRECISION arrays are passed.
The kernel calculates a value based on these for the given thread index (I), then attempts to store it in an array, R, declared:
DOUBLE PRECISION, INTENT(OUT) :: R(:)
For illustration purposes, the calculation is
(a(I)*b(I)) + c + d + e + f + g + h
When I do
WRITE(*,*) (a(I)*b(I)) + c + d + e + f + g + h
in the kernel, I can see the value of the term correctly is 4.4408920985006262E-016
When I set:
R(I) = (a(I)*b(I)) + c + d + e + f + g + h
The value of R(I) is zero.
I know that the values are on the boundaries of machine precision so this must be significant. If I explicitly set R(I) to some small constant, for example
R(I) = 5
Then everything works as expected, so I don’t believe there is anything wrong with the process of calling and returning values from the kernel.
Is there a precision limitation that applies or compiler flags that I could be missing?
Any help here would be greatly appreciated.