Hi,
In the case of 1D Grid of 1D blocks, the grid-stride loops is implemented as follows:
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; i += blockDim.x * gridDim.x)
But, how is it implemented for 2D grid of 1d blocks?
Thanks
Hi,
In the case of 1D Grid of 1D blocks, the grid-stride loops is implemented as follows:
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; i += blockDim.x * gridDim.x)
But, how is it implemented for 2D grid of 1d blocks?
Thanks
Check the last reply here:
https://stackoverflow.com/questions/22593936/cuda-grid-stride-loops-over-2d-arrays