I download pgi workstation complete 10.3 Linux x86_64 , read the pdf files and wrote my first cuda fortran program , but I failed to compile it.
thanks a lot.
module cacmu
use cudafor
contains
attributes(global) subroutine cac(n,x)
implicit none
integer :: n
real :: x
integer :: i
x=0.0
do i=1,N
x=x+real(i)
enddo
end subroutine cac
end module cacmu
program main
use cudafor
use cacmu
implicit none
integer :: n_=1000000*64
real :: x_
call cac<<<n_/64,64>>>(n_,x_)
print *,x_
end program main
then I compiled it as following
$ pgf95 1.cuf
PGF90-S-0188-Argument number 1 to cac: type mismatch (1.cuf: 22)
PGF90-S-0188-Argument number 2 to cac: type mismatch (1.cuf: 22)
0 inform, 0 warnings, 2 severes, 0 fatal for main
You have several problems with your code and compilation. The two that will prevent you from compiling are that in your kernel you should have integer, value :: n, real, value :: x. You must also compile with -Mcuda. Another problem with your code is that I don’t think you understand how GPU programming works. You have no references to threads. Perhaps you are thinking of Accelerator instead (#pragma acc).
In CUDA, every thread will execute the same kernel. In your example, you have all threads sequentially summing a value. As BeachHut suggests, you could get this code running, but I doubt it would be very fast nor what you intended.
While you can perform a sum reduction in parallel (I touch upon it in my last PGI Insider Article), it’s rather difficult. Instead, you should consider starting with a simple Matmul program (See: Account Login | PGI)
Hopefully, this will get you started. If not, please let us know and we’ll try to help further.