I am relatively new to cuda and I can get my code to compile and run, but it does not appear to be executing the module on the GPU. I have an example here taken from “CUDA Fortran for scientists and engineers”, and it is not executing on the GPU:
module simpleOps_m contains attributes(global) subroutine increment(a, b) implicit none integer , intent(inout) :: a(:) integer , value :: b integer :: i i = threadIdx%x a(i) = a(i)+b end subroutine increment end module simpleOps_m program incrementTestGPU use cudafor use simpleOps_m implicit none integer , parameter :: n = 256 integer :: a(n), b integer , device :: a_d(n) a=1 b=3 a_d = a call increment <<<1,n>>>(a_d , b) a = a_d write(*,*) "a = ", a if (any(a /= 4)) then write(*,*) "**** Program Failed ****" else write(*,*) "Program Passed" endif end program incrementTestGPU
I place the code in a file named test2.f95 and run the compiler command “pgf90 -Mcuda test2.f95”. However, when I run the program all the values of a are still 1 and I get program failed. I appears the part on the GPU is not running. I am using PGI community edition on ubuntu 16.04.
Any idea what could be wrong?