What does threadIdx.x+blockDim.x mean?

Following Code:
global
void add(int n, float *x, float *y)
{
int index = threadIdx.x;
int stride = blockDim.x;
for (int i = index; i < n; i += stride)
y[i] = x[i] + y[i];
}`
if i assign 32 threads to this kernel, blockDIM.x will be 32. Why do I need to add 32 to i? Wouldn’t that just jump over 32 elements of the array?

Each thread will jump over blockDIM.x elements, because they will be processed with other threads. If you will add less then blockDIM.x in your code you will process some elements more then one time, if more - you will skip some elements
-sum
Basicly idea looks like on this image

1 Like

Your kernel design is a block-striding loop, which is a single-block variant of a grid striding loop.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.