2D grid and 1D Thread Block


Is it possible to call a kernel call with 2D grid and 1D thread block?

I have an array that is very big and I have to multiply each element with a constant.

If I use a 1D grid and 1D thread block, I cannot fit everything in. I was thinking of 2D grid, but I don’t know how to get the thread Id.

For 1d grid and 1d thread block,
int idx = blockIdx.x * blockDim.x + threadId.x

For 2d grid and 1d thread block?

for example

idx = (blockIdx.y * gridDim.x + blockIdx.x) * blockDim.x + threadId.x

in fact you only have to be careful when using 2/3D blocks, grids is no problem

Then for 2d grid and 2d thread blocks?

Well if you want to access a 1D vector type array then I use the following:

int id = threadIdx.x+blockDim.xthreadIdx.y+(blockIdx.xblockDim.xblockDim.y)+(blockIdx.yblockDim.x*blockDim.y);

Remember you need to define the blocks and grids as multidimensional using
uint3 tids = make_uint3(TIDSX, TIDSY, 1); or similar

almost, just missing the gridDim.x for the blockIdx.y:

int id = threadIdx.x+blockDim.xthreadIdx.y+(blockIdx.xblockDim.xblockDim.y)+(blockIdx.yblockDim.xblockDim.ygridDim.x);

now you can access blockDim.xblockDim.ygridDim.x*gridDim.y elements in your 1D vector

Ahhhh… yes indeed. Thanks for the correction!

Thank alot… But, i still don’t quite understand how you guys get the “equation” to find the ID…

If possible, can explain abit?

Lets say you have a 2x2 block and a 3x3 grid.

So you have 4 elements per block and 9 blocks per grid making a total of 36 adressable elements

0th element: tid.x = tid.y = bid.x = bid.y = 0

id = 0 + 02 + 022 + 0223 = 0

1st element: tid.x = 1 all others = 0:

id = 1 + …(0) = 1

2nd element: tid.x is zero again but tid.y = 1:

id = 0 + 12 + 022 + 0223 = 2

and so on…

so you see you have 4 “directions” to count:

threadIdx.x (innermost): if it reaches the blockDim.x border, it is reset to zero and threadIdx.y is increased by one.

threadIdx.y (number of lines of blockDim.x elements): if this one reaches the blockDim.y border, it is reset to zero and the blockIdx.x is increased by one.

blockIdx.x (number of blocks or in the 1D case “lines” of blockDim.x*blockDim.y elements): if this one reaches the gridDim.x border, it is reset to zero and the blockIdx.y is increased by one.

blockIdx.y (number of blocks or in the 1D case “lines” of blockDim.xblockDim.ygridDim.x elements): if this one reaches the gridDim.y border, the kernel is finished.

so your order in the 2x2 and 3x3 case would be:



|| 0  1|| 4  5|| 8  9||

|| 2  3|| 6  7||10 11||



||12 13||16 17||20 21||

||14 15||18 19||22 23||



||24 25||28 29||32 33||

||26 27||30 31||34 35||