A value sended by host not return correctly by device using

I took an example of data transfer between Host and Device for CUDA Fortran and found this:

Host Code:

 program incTest  
        use cudafor
        use simpleOps_m
        implicit none
        integer, parameter :: n = 256
        integer :: a(n), b, i
        integer, device :: a_d(n)
        a = 1
        b = 3
        a_d = a
        call inc<<<1,n>>>(a_d, b)
        a = a_d
        if (all(a == 4)) then
            write(*,*) 'Success'
        endif
end program incTest

Device Code:

module simpleOps_m
    contains
        attributes(global) subroutine inc(a, b)
            implicit none
            integer :: a(:)
            integer, value :: b
            integer :: i
            i = threadIdx%x
            a(i) = a(i)+b
        end subroutine inc
end module simpleOps_m

The expected outcome is the console presenting “Success”, but this did not happen. Nothing did, nothing errors or messages. This happen because don’t enter in if, because a_d has the same value that before call inc subroutine.

I’m using:

OS: Linux - Ubuntu 16

Cuda 8

PGI to compile

Commands to compile:

    pgf90 -Mcuda -c Device.cuf
    pgf90 -Mcuda -c Host.cuf
    pgf90 -Mcuda -o HostDevice Device.o Host.o
    ./HostDevice

I tried other examples and they did not work too.

I tried using simple Fortran (.f90) code with the same commands to compile and it works!

How can I fix this problem?

Hi RenatoFP,

What type of device are you using? (If you don’t know, post the output from the “pgaccelinfo” utility).

My best guess is that you have a Pascal based device in which case you need to compile with “-Mcuda=cc60”.

For example, if I add error checking to the example code, we see that we get an invalid device kernel error when running on a Pascal without the “cc60” as part of the compilation.

% cat test.cuf
module simpleOps_m
     contains
         attributes(global) subroutine inc(a, b)
             implicit none
             integer :: a(:)
             integer, value :: b
             integer :: i
             i = threadIdx%x
             a(i) = a(i)+b
         end subroutine inc
 end module simpleOps_m

program incTest
         use cudafor
         use simpleOps_m
         implicit none
         integer, parameter :: n = 256
         integer :: a(n), b, i, istat
         integer, device :: a_d(n)
         a = 1
         b = 3
         a_d = a
         call inc<<<1,n>>>(a_d, b)
         istat=cudaDeviceSynchronize()
         istat=cudaGetLastError()
         a = a_d
         if (all(a == 4)) then
             write(*,*) 'Success'
         else
             write(*,*) 'Error code:', cudaGetErrorString(istat)
         endif
 end program incTest
% pgf90 test.cuf -Mcuda
% a.out
 Error code:
 invalid device function                                                        
% pgf90 test.cuf -Mcuda=cc60
% a.out
 Success

Hope this helps,
Mat

CUDA Driver Version: 8000 NVRM version: NVIDIA UNIX x86_64 Kernel Module 375.39 Device Number: 0 Device Name: GeForce GTX 1080 PGI Compiler Option: -ta=tesla:cc60; This are the informations by pgaccelinfo. I compiled with -Mcuda=cc60 and did right! Thank you mkcolg.