I have a code which works correctly with PGI 19.4 + cuda 10.0 under Windows 10 with Nvidia quadro P4000
I now have a new computer which has a more powerful graphic card (Quadro RTX 5000), but the code does not run correctly with this new machine (gives a lot of NaNs). This machine runs Ubuntu 18.04, PGI 19.4 and cuda 10.1.
I compiled the code with the following options:
for Windows machine (working correctly):
FFLAGS = -fast -Mlarge_arrays -Mcuda=cc60,ptxinfo -ta=tesla:cc60
with Ubuntu (not working correctly):
FFLAGS = -fast -Mlarge_arrays -Mcuda=cc75,ptxinfo -ta=tesla:cc75
I see some differences in the ptxinfo during compiling.
pgfortran -c -fast -Mlarge_arrays -Mcuda=cc75,ptxinfo -ta=tesla:cc75 check_cuda-single.f90
ptxas info : 0 bytes gmem
ptxas info : Compiling entry function ‘check_mod_check_kernel4_’ for ‘sm_75’
ptxas info : Function properties for check_mod_check_kernel4_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 66 registers, 400 bytes cmem[0]
ptxas info : Function properties for check_mod_checkov0_
56 bytes stack frame, 56 bytes spill stores, 56 bytes spill loads
and
pgfortran -c -fast -Mlarge_arrays -Mcuda=cc60,ptxinfo -ta=tesla:cc60 check_cuda-single.f90
ptxas info : 0 bytes gmem
ptxas info : Compiling entry function ‘check_mod_check_kernel4_’ for ‘sm_60’
ptxas info : Function properties for check_mod_check_kernel4_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 61 registers, 368 bytes cmem[0], 36 bytes cmem[2]
ptxas info : Function properties for check_mod_checkov0_
40 bytes stack frame, 36 bytes spill stores, 36 bytes spill loads
See here, with the new machine, it says:
ptxas info : Used 66 registers, 400 bytes cmem[0]
but with the old working machine it says:
ptxas info : Used 61 registers, 368 bytes cmem[0], 36 bytes cmem[2]
But is it the reason? If it is so, what shall I do? Or if there are other reasons?
Thank you very much in advance!