I have a grid set with dim3 GridDim(1,128,1)
Each block is set with dim3 BlockDim(16,1,1)
I allocate a 3D cuda array sized (x=512,y=129,z=128)
Each block writes in his respective 2d “sub”-array of the 3d array (sub array selected with blockIdx.y).
That sub-array is located on the x and z axis. There are 129 sub arrays.
Each thread writes in a cell of that particular sub-array.
The 16 threads of a block write in the sub-array in a coalesced way on the x axis( thread k writes in cell x=k).
Writes are looped until the 128 blocks have ended filling their own sub-array.
When that is done, another kernel is launched with 64 blocks of 1024 threads.
The goal of that kernel is to fill the last (the 129th) sub-array on the y axis.
Each thread receives its own (x,z) coordonates and read through the y axis. Therefore the reads should be coalesced as all the 65536 threads of that kernel read in the same “layer” at the same time.
I would like to know if the coalescence is true on the three axis of a 3d array as i write on the x axis and read on the y axis ?