Hi,
In the case of 1D Grid of 1D blocks, the grid-stride loops is implemented as follows:
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; i += blockDim.x * gridDim.x)
But, how is it implemented for 2D grid of 1d blocks?
Thanks
Hi,
In the case of 1D Grid of 1D blocks, the grid-stride loops is implemented as follows:
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; i += blockDim.x * gridDim.x)
But, how is it implemented for 2D grid of 1d blocks?
Thanks
Check the last reply here:
[url]CUDA grid stride loops over 2D arrays - Stack Overflow