A value sended by host not return correctly by device using CUDA Fortran

I took an example of data transfer between Host and Device for CUDA Fortran and found this:

Host Code:

program incTest  
    use cudafor
    use simpleOps_m
    implicit none
    integer, parameter :: n = 256
    integer :: a(n), b, i
    integer, device :: a_d(n)
    a = 1
    b = 3
    a_d = a
    call inc<<<1,n>>>(a_d, b)
    a = a_d
    if (all(a == 4)) then
        write(*,*) 'Success'
    endif
end program incTest

Device Code:

module simpleOps_m
contains
    attributes(global) subroutine inc(a, b)
        implicit none
        integer :: a(:)
        integer, value :: b
        integer :: i
        i = threadIdx%x
        a(i) = a(i)+b
    end subroutine inc
end module simpleOps_m

The expected outcome is the console presenting “Success”, but this did not happen. Nothing did, nothing errors or messages.

I’m using:

OS: Linux - Ubuntu 16

Cuda 8

PGI to compile

Commands to compile:

pgf90 -Mcuda -c Device.cuf
pgf90 -Mcuda -c Host.cuf
pgf90 -Mcuda -o HostDevice Device.o Host.o
./HostDevice

I tried other examples and they did not work too.

I tried using simple Fortran (.f90) code with the same commands to compile and it works!

How can I fix this problem?

Your code runs correctly for me.

It may be that your CUDA installation is broken. Follow the linux install guide to install CUDA and perform the verification steps. Check the output of nvidia-smi and also run some CUDA sample codes like deviceQuery

also your cross posting:

http://stackoverflow.com/questions/43340341/a-value-sended-by-host-not-return-correctly-by-device-using-cuda-fortran

has a useful response