Hi everybody
I am giving my first steps with cuda fortran but I am getting a problem with the following simple code incTest.cuf
module simpleOps_m
contains
attributes(global) subroutine inc(a,b)
implicit none
integer :: a(:)
integer, value :: b
integer :: i, n
i= threadIdx%x
a(i)=a(i)+b
end subroutine inc
end module simpleOps_m
program incTest
use cudafor
use simpleOps_m
implicit none
integer :: b, n=256
integer, allocatable :: a(:)
integer, allocatable, device :: a_d(:)
allocate (a(n), a_d(n))
a=1
b=3
a_d=a
call inc<<<1,n>>>(a_d,b)
a=a_d
if (all(a==4)) &
write(*,*) 'Test Passed'
deallocate (a,a_d)
end program incTest
I compiled the code with pgf90 -Minfo -o exe incTest.cuf and I got the following
inctest:
33, all reduction inlined
but when I ran it I didn’t get the message ‘Test Passed’. Before I tested an even simpler code copydat.cuf and it worked
program copyData
use cudafor
implicit none
integer, parameter :: n=256
real :: a(n), b(n)
real, device :: a_d(n),b_d(n)
a=1.0
a_d=a
b_d=a_d
b= b_d
if (all(a==b)) &
write(*,*) 'Test passed'
end program copyData
What test can I do to know what is the problem
Thanks for your time
Hi oscar_ml,
Most likely your kernel is failing for some reason. CUDA won’t throw and error, so instead it’s best to add error checking after each kernel. (See below).
What device are you using? I suspect that you might be on a newer or older than device than one of the default targets for your compiler version. For example with PGI 18.4, we target cc35, cc50, and cc60 since we default to using CUDA 8.0. Later compiler versions will also add cc70 as well as use CUDA 9.0.
You may need to set the target compute capability for your device such as “-Mcuda=cc70” for a V100.
Note that in our 18.10 release, we’ll default to target the device that’s detected on the compiling system.
% cat test.cuf
module simpleOps_m
contains
attributes(global) subroutine inc(a,b)
implicit none
integer :: a(:)
integer, value :: b
integer :: i, n
i= threadIdx%x
a(i)=a(i)+b
end subroutine inc
end module simpleOps_m
program incTest
use cudafor
use simpleOps_m
implicit none
integer :: b, n=256, rc
integer, allocatable :: a(:)
integer, allocatable, device :: a_d(:)
allocate (a(n), a_d(n))
a=1
b=3
a_d=a
call inc<<<1,n>>>(a_d,b)
rc = cudaGetLastError()
if (rc.ne.0) then
print *, "Error: ", cudaGetErrorString(rc)
endif
a=a_d
if (all(a==4)) &
write(*,*) 'Test Passed'
deallocate (a,a_d)
end program incTest
% pgf90 test.cuf -V18.4 -Mcuda=cc35
% a.out
Error:
invalid device function
% pgf90 test.cuf -V18.4 -Mcuda=cc70
% a.out
Test Passed
Hope this helps,
Mat
Mat
Many thanks for your reply. You were right. I verified my compiler version
%pgf90 --versionpgf90
18.4-0 64-bit target on x86-64 Linux -tp sandybridge
PGI Compilers and Tools
Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
and my device, which is a quadro K600. Then I compiled the code with the flag -Mcuda=cc30 and got the right answer
% pgf90 test.cuf -V18.4 -Mcuda=cc70
% a.out
Test Passed
It was a silly mistake. Next time I will come with a more challenging question ;)
BTW: Do you have some link with the heat example using cuda fortran. It will be very useful to write my app
BTW: Do you have some link with the heat example using cuda fortran. It will be very useful to write my app
We do have a laplace example as part of our OpenACC training, but not for CUDA Fortran.
In Chapter 8 from the example sources from the Parallel Programming with OpenACC, there are examples of a 2D-heat program in both OpenACC and CUDA C.
They might help show at least the algorithm and code structure that you could translate in to CUDA Fortran.