Code Ends without thread execution

I’m trying to run a simple code on a dual GPU desktop. i successfully build it and run it but the answer given is ‘*program failed’ … the answer should be ’ Program Passed’.

I think it does not execute the GPU code because i check the final ‘a’ value, it is still 1.

so what should i do so that CUDA fortran will execute GPU code? Thx

LHW

PS: this is the code

module simpleOps_m
contains
attributes ( global ) subroutine increment (a , b )
implicit none
integer , intent ( inout ) :: a (:)
integer , value :: b
integer :: i
i = threadIdx % x
a ( i ) = a ( i )+ b

end subroutine increment
end module simpleOps_m


program incrementTest
use cudafor
use simpleOps_m
implicit none
integer , parameter :: n = 256
integer :: a ( n ) , b
integer , device :: a_d ( n )
integer :: istat

a = 1
b = 3

istat = CudaSetDevice(0)
a_d = a
call increment <<<1 ,n>>>( a_d , b )
a = a_d

write(,) a(:)

if ( any ( a /= 4)) then
write (* ,) ’ **** Program Failed **** ’
else
write (
,*) ’ Program Passed ’
endif
end program incrementTest
[/img]

Hi LWH,

I tried your code on a variety of systems, OS, and compiler versions, and all pass for me. What system (OS, CPU, GPU) and compiler version are you using?

My guess is that the problem is with your system. What does the output from the utility “pgaccelinfo” show? Are you able to run a simple CUDA C program?

  • Mat