Volatile in Kernels

I tried to create a device subroutine that synchronizes multiple cuda blocks. For this to work it is necessary to declare a variable volatile:


attributes(device) subroutine syncblocks(syncval)

implicit none

integer,volatile :: syncval
integer :: dummy

call threadfence()
call syncthreads()
if (threadIdx%x==1) then
dummy=atomicAdd(syncval,1)
do
if (syncval == gridDim%x) then
exit
end if
end do
end if
call syncthreads()

end subroutine syncblocks

If I write the equivalent in CUDA C it works quite fine:


device void __syncblocks(int* syncval) {
__threadfence();
__syncthreads();
if (threadIdx.x==0) {
atomicAdd(syncval,1);
while((volatile int)syncval<gridDim.x) {
}
}
__syncthreads();
}

Unfortunately the Fortran version does not work. This is because the volatile keyword is simply dropped during the translation to CUDA C. The above kernel results in:


extern “C” device void syncblocks(signed char* _psyncval)
{
int dummy;
int xthreadidx_x;
int xgriddim_x;
xthreadidx_x = (int)(threadIdx.x+1);
xgriddim_x = (int)gridDim.x;
__threadfence();
__syncthreads();
if( ((xthreadidx_x)!=(1))) goto _BB_6;
dummy = __pgi_atomicAddi((signed char*)(_psyncval), 1);
_BB_5: ;
if( ((((int)(_psyncval)))!=(xgriddim_x))) goto _BB_5;
_BB_6: ;
__syncthreads();
}

If volatile is not supported for device code in CUDA Fortran, I think the compiler should give an error or at least a warning.

Hi Denis,

I talked with our engineers and they will get this fixed in August’s 11.8 release. Volatile should get passed to the generated CUDA C code.

Best Regards,
Mat