I’m using CUDA Fortran. In the kernel I can use “integer, value :: N” to specify the size of an array in global memory. But if gives me an error if I try to do the same for an array in shared memory.
This doesn’t work:
attributes(global) subroutine my_kernell(Xdev,N)
real, device, dimension(N,1) :: Xdev
real, shared :: Xshared(N)
integer, value :: N
But this works (assuming N=10):
attributes(global) subroutine my_kernell(Xdev,N)
real, device, dimension(N,1) :: Xdev
real, shared :: Xshared(10)
integer, value :: N
Any idea why? How to pass the size of an arrays in shared memory (using a variable)?
The issue here is that ‘Xshared’ is an automatic array and automatics are implicitly allocated upon entry to the function. However, GPU threads can’t allocate memory so we had to make automatics illegal within device code. Hence, all local device arrays need to be fixed size.
In my code “N” is gonna be constant through all the program
(an “integer, parameter” in the main program).
Is there anyway I could of say “Xshared” to have a size “N”?
(In such a way that if I want to modify the value of “N” I will only
modify the main program, and then compile it).
If N is parameter in the same module or in another module that’s used in the same as your kernels, then it would be ok to use since the size is fixed. Otherwise, you can use preprocessing macros to set the value of N across multiple files.
Something like:
module myparams
integer : N
PARAMETER (N = 256)
! or
PARAMETER (N = NVAL) ! where NVAL is set on the command line -DNVAL=256
end module myparams
module mykernels
use myparams
contains
attributes(global) subroutine mykernel(Arr)
implicit none
real, dimension(N), Arr
real, shared, dimension(N) :: sharedArr
end subroutine mykernel
end module mykernels
Hi Mat,
What you stated seem not to agree with what described in the CUDA Fortran manual 10.9 (pg 24/56) - last paragraph.
Shared arrays that are not dummy arguments may be declared as assumed-size arrays; that is, the last
dimension of a shared array may have an asterisk as its upper bound:
What you had in your previous example was right, you can have a shared memory array with unspecified upper bounds, and it will get sized via the 3rd argument in the chevron syntax. Note, however, that all arrays that are declared that way overlay each other. This is consistent with CUDA C.