data locally defined in a kernel

If I define a variable in a kernel (global, or device), would it be automatically reside in the device memory, or I have to explicitly use one appropriate attribute (shared, device).

attributes(global) subroutine dosomething()

  integer :: var1  ! where var1 locates???

end subroutine dosomething

attributes(device) subroutine dosomething()

  integer :: var2  ! where var2 locates???

end subroutine dosomething

Tuan

Hi Tuan,

Both variables are on the device (unless you’re in emulation mode).

  • Mat

Thanks, Mat.

Tuan

Without any explicit attribute (shared, device, local), would those variables allocated in the global memory by default or the compiler will try to allocate in the shared memory first?

Tuan

Hi Tuan,

They would be placed in global memory by default.

  • Mat

It seems that “local” is not an attribute to indicate a thread private variable (to reside on processor registers). Is that using “value” attribute correct?

Thanks,
Tuan

No, ‘value’ mean to pass by value. There currently isn’t a way for the user to request that a variable be placed in a register. I’ll put in a feature request.

  • Mat

If I understood correctly, all the variables declared in a device routine are saved in the device global memory.
The following is a part of code taken from the PGI CUDA Fortran Programming Guide

attributes(global) subroutine mmul_kernel( A, B, C, N, M, L )
real :: A(N,M), B(M,L), C(N,L)
integer, value :: N, M, L
integer :: i, j, kb, k, tx, ty
! submatrices stored in shared memory
real, shared :: Asub(16,16), Bsub(16,16)
! the value of C(i,j) being computed
real :: Cij
! Get the thread indices
tx = threadidx%x
ty = threadidx%y
! This thread computes C(i,j) = sum(A(i,:) * B(:,j))
i = (blockidx%x-1) * 16 + tx
j = (blockidx%y-1) * 16 + ty
Cij = 0.0
[ other code]

My question is: if i, j, Cij, and so on, are in the global device memory, and therefore are not private to each thread, why there aren’t data races or other problems accessing the same place in memory?

I’m writing some programs in CUDA fortran and I don’t understand when a variable is private to a single thread (and if it is possible).

Thank you

Hi goblinsqueen,

By ‘global’, I meant by default the local variables will be placed in the device memory (i.e. global memory). This does not mean that these variables have a global scope. Rather, each thread will have it’s own private copy of the local variables.

Also, some variables may be placed in a multi-processor’s local memory (i.e. it’s registers). However, you the programmer do not have control over this. It ultimately up to the NVIDA tools to determine exactly where the variable is stored.

Hope this helps,
Mat

Thank you Mat,
now it’s clearer.

Hi Mat,
Sorry to bring up this old thread. Does Fortran already support the feature for data to be placed in a register? In CUDA C, I believe that user cannot; but the system will try to put the variables in the registers first; if not, it will spill to the off-chip memory.

Thanks,
Tuan

Hi,

Sorry, no such feature at the moment.

Hongyon