L1 Cache coherency of CUDA blocks
Device global memory is cached in the same L1 cache for all threads in a block, but we are warned that the L1 caches for different blocks can become incoherent.
Putting this together with the fact that data is transferred in 128 Byte “chunks”, I wondered what would
happen if two blocks wrote to different parts of the same chunk of global memory. I expected that
the two L1 caches would become incoherent and the results eventually written to RAM would be indeterminate.
However, my first experiment shows no evidence of this behaviour:
////////////////////// EXPERIMENTS KERNELS START ///////////////////////////////
//assume dimBlock(WARPSIZE, 4), dimGrid(32)
__global__ void Experiment_On_4KBytes_Kernel_A(uint8_t * __restrict__ pBytes)
{
unsigned offset = 128 * threadIdx.x + 4 * blockIdx.x + threadIdx.y;
*(pBytes + offset) = offset % 43; //prime number hash
}
//assume 1 thread
__global__ void Experiment_On_4KBytes_Kernel_B(uint8_t * __restrict__ pBytes)
{
unsigned errs = 0;
for (unsigned chunk = 0; chunk < 32; ++chunk)
for (unsigned block = 0; block < 32; ++block)
for (unsigned word = 0; word < 4; ++word)
{
unsigned offset = 128 * chunk + 4 * block + word;
errs += ( (unsigned) pBytes[offset] != offset % 43);
}
printf("EXPERIMENT RESULT: %u\n",errs); //the result was 0, even though 32 different blocks wrote in each chunk of 128 bytes!
}
////////////////////// EXPERIMENTS KERNELS END ///////////////////////////////
.
.
.
////////////////////// EXPERIMENTS START ///////////////////////////////
{
dim3 dimBlock(WARPSIZE, 4), dimGrid(32);
Experiment_On_4KBytes_Kernel_A<<<dimGrid, dimBlock, 0, stream>>>(pBytestream);
Experiment_On_4KBytes_Kernel_B<<<1, 1, 0, stream>>>(pBytestream);
}
////////////////////// EXPERIMENTS END ///////////////////////////////
This suggests to me that either my experiment is not valid or there is a feature in Invidia GPUs to prevent
WAW hazards, perhaps using the following idea I found on the internet:
Any ideas on the experiment or on the GPU operation would be very helpful.