subroutine variable_kernel (nx,ny, p, c)
integer, value :: nx, ny
integer :: i, j
real(8) :: p(nx,ny), c(nx,ny)
real(8) :: a(nx,ny)
i = (blockidx%x - 1) * blockDim%x + threadidx%x
j = (blockidx%y - 1) * blockDim%y + threadidx%y
a = 3.0
if((i <= nx) .AND. (j <= ny)) then
c(i,j) = a(i,j) + p(i,j)
end if
end subroutine variables_kernel
for some reason fortran compiler (pgi 12.3) does not allow me to do so,
error : a cannot be automatic.
how can I work around that??
I have a large subroutine that I need to use matrices with variable dimensions I need to pass to subroutine during calling time.
please advice.
The compiler is right, ‘a’ can’t be an automatic array. The issue is that automatic arrays are allocated upon entry to a subroutine. On a CPU this is fine, but the GPU cannot allocate its own memory. Only the host can allocate memory on the CPU.
There are a couple ways around this that I know of. First, you can make ‘a’ a device array like pDev and cDev, which you probably allocated in the host code a la allocate(pDev(nx,ny)), allocate it on the host, and then pass it through the interface to do a ‘rename’ like you did p and c.
But, if you want the style of an automatic array, you can make it an explicit-shape array with constant bounds. That is, if the compiler knows at compile-time the exact size of a GPU array, it can set aside that space in the code it generates. So, you could declare a as ‘a(30,30)’. (Or use a preprocessor macro like a(MAXSIZE,MAXSIZE) and compile with -DMAXSIZE=30.)
(A third way to have automatic-like arrays is to use shared memory and the third chevron argument, but you are limited by the amount of shared memory available. A 30x30 real*8 array is pretty big, so you’d have to be careful.)
Of course, the folks at PGI will chime in soon and probably have more/better advice. These are just things I’ve picked up on.