I’m new to CUDA_FORTRAN. Can someone please explain what is wrong with the following code…?
Why am I getting segmentation fault…?
attributes(global) subroutine increment(a)
implicit none
real(kind=8),device::a(:)
integer :: i
i = threadIdx%x
a(i)=a(i)+1
if(a(i)/=2)then
write(*,*)' Program Passed in Device'
endif
end subroutine increment
module common
implicit none
real(kind=8):: a(2)
real(kind=8),device:: a_d(2)
end module common
program incrementTestGPU
use cudafor
use common
implicit none
a=1
a_d=a
call prac
end program incrementTestGPU
subroutine prac
use common
use cudafor
call increment<<<1,2>>>(a_d)
a=a_d
if(any(a/=2))then
write(*,*)' **** Program Failed **** '
else
write(*,*)' **** Program Passed **** '
endif
end subroutine prac
One user in StackOverFlow commented increment is an external subroutine (not in the module), but has an assumed-shape dummy argument. Did you mean for it to have an assumed-size argument ( a(*) instead of a(:) ). I’m not entirely convinced that CUDA Fortran plays well with assumed-shape dummies but to give it a reasonable chance you’ll likely need to give increment an explicit interface in prac.
I don’t understand what he was saying. I tried a(*), a(:), a(2) as dummy arguments, the program fails every time. What does he mean to give increment an explicit interface in prac…? The program is working fine if I change the kernel to a normal subroutine which increments array in a loop. How is calling a kernel subroutine different from calling a normal subroutine…?
What does he mean to give increment an explicit interface in prac …? The program is working fine if I change the kernel to a normal subroutine which increments array in a loop. How is calling a kernel subroutine different from calling a normal subroutine…?
In Fortran a subroutine without an interface uses F77 style API calling conventions which basically just pushes the arguments on the stack and does no argument checking. For the array, only the raw pointer is passed, not the descriptor. So while this may “work” in your simple host example, it’s not really correct given you’re passing in an assume-shape array. The example would most likely fail if you tried to use any intrinsic that needs the array descriptor.
CUDA Fortran does require explicit interfaces to be used so the correct data is passed. Besides the descriptor, things like the ‘device’ or ‘value’ attributes are only exposed via an interface.