Device do not change variable value


we added a new card (with chip GeForce GTX 1080) to the computer and installed PGI compiler community edition. On the computer runs Fedora 19 with NVIDIA driver 375.26. The compiling run without error, but only programs like DeviceQuery or MemoryTest are working (CUDA Fortran). When we run test program like SAXPY, where the value of variable is changed in the device, it does not compute anything. We tried also CUDA toolkit compiler and it work well.

Does anyone know what is wrong?


What flags are you using to compile?

A GTX1080 use a Pascal (CC60) based architecture which we don’t currently generate device code for by default. Instead you need to specify this as a compiler option, “-Mcuda=cc60” for CUDA Fortran, or “-ta=tesla:cc60” for OpenACC.

Most likely you’re not using the “cc60” option so your CUDA Fortran kernels are failing. However most of the same codes don’t include error handling, so will silently fail and why the output isn’t getting updated.

Hope this helps,

Thanks for your reply Mat. We were not using any flags, but event with “-Mcuda=cc60” it is still not working.
I am posting a source code, which is working properly:

program copyData
use cudafor
implicit none
integer, parameter :: n = 2
real :: a(n), b(n)
real, device :: a_d(n), b_d(n)

a = 1.0
a_d = a
b_d = a_d
b = b_d

if (all(a == b)) &
write(,) ‘Test Passed’
end program copyData

and here is a code, which do not compute and leave value of y unchanged:

module mathOps
attributes(global) subroutine saxpy(x, y, a)
implicit none
real :: x(:), y(:)
real, value :: a
integer :: i, n
n = size(x)
i = blockDim%x * (blockIdx%x - 1) + threadIdx%x

if (i<=n) y(i) = y(i) + a*x(i)
end subroutine saxpy
end module mathOps

program testSaxpy
use mathOps
use cudafor
implicit none
integer, parameter :: N = 40
real :: x(N), y(N), a
real, device :: x_d(N), y_d(N)
type(dim3) :: grid, tBlock
integer :: blocksize

blocksize = 2048
tBlock = dim3(512,1,1)
grid = dim3(ceiling(real(N)/tBlock%x),1,1)

x = 1.0; y = 2.0; a = 2.0

x_d = x
y_d = y
call saxpy<<<ceiling(real(N)/blocksize), blocksize>>>(x_d, y_d, a)
y = y_d
write(,) ‘Max error:’, maxval(abs(y-4.0))
end program testSaxpy

For compiling, we use “pgf90” command.

Adding error checking:

 call saxpy<<<ceiling(real(N)/blocksize), blocksize>>>(x_d, y_d, a)
 istat = cudaGetLastError()
 if ( then
   print *, "Error: ", cudaGetErrorString(istat)

You’ll see the problem; a bad launch configuration.

% a.out
 invalid configuration argument
 Max error:    2.000000

Your blocksize is 2048 which is larger than the maximum block size of 1024. Reducing the blocksize to 1024 or less, fixes the problem.

Hope this helps,

Thank you, everything is now working.