I am wondering how to have an array that is used in a device kernel that has array bounds that are not known prior to run time.
I have made a little example where A and B are vectors of nTotal elements. nTotal is not known until runtime. A is a variable that gets allocated and assigned in the main program. B is intended to be a device array that stays on the device only. B needs to be an array because the elements of B are unique:
Module Kernel Use cudafor Contains Attributes(Global) Subroutine NotPassedAutomatic(A,nTotal) Implicit None Integer:: i Integer, Value:: nTotal Integer, Device:: A(nTotal), B(nTotal) B = 2 i = (blockIdx%x-1)*blockDim%x + threadIdx%x If (i >= 1 .and. i <= nTotal) Then A(i) = A(i) + B(i) End If End Subroutine NotPassedAutomatic End Module Kernel Program Main Use cudafor Use Kernel Integer:: nTotal Integer, Allocatable:: A(:) Integer, Device, Allocatable:: A_d(:) nTotal = 1024 Allocate(A(nTotal),A_d(nTotal)) A = 1 A_d = A Call NotPassedAutomatic<<<ceiling(real(nTotal)/128),128>>>(A_d,nTotal) A = A_d If (any(A /= 3)) Then Write(*,*) "Failed A not equal to 3" Else Write(*,*) "Passed A equal to 3" End If Deallocate(A,A_d) End Program Main
The program will compile and run if you change B to a scalar value in the device kernel.
Can someone help me understand how to do this properly.