How to calculate ThreadIDs in a 2D GRID array

If I had a kernel launch liked this:
dim3 dimBlock(16,16);
dim3 dimGrid(64,64);
kernel<<<dimGrid,dimBlock>>>();

How do I calculate each unique ThreadID?
txid = ThreadIdx.x + blockDim.x * blockIdx.x //I get 0 a 1023 for threads in X axis
tyid = ThreadIdx.y + blockDim.y * blockIdx.y //I get 0 a 1023 for threads in Y axis

And the rest os threads, there are 1,048,576 of threads running!!??
Where’s thread ID 401,000?

I would appreciate any help!
Thanks in advance.

s

Hello the formulas for txid ans tyid are correct and you are running 1024x1024 threads. You would then access an element of a matrix by using int ind=tyid+txid*blockDim.y. The thread 401000 corresponds to txid=391 and tyid=616.

My friend,

Thanks for the reply, but blockDim.y is only 16

id = tyid + txid * blockDim.y

id = 616 + 391 * 16 = 16112!

Hello,

I think you are mixing threads per blocks and blocks of threads. In CUDA threads are ‘packed’ in blocks of size (x, y, z). When you execute a kernel you decide how many ‘blocks of threads’ you want to launch. So the total number of threads is dimBlock * dimGrid. In your case (1664) in the x dimension and (1664) in the y dimension → 1024x1024 threads but not running at the same time. They are scheduled and assigned to each processor when there are free resources to work. Anyway, from a global point of view you have launched 1,048,576 threads.

Now you have decided your kernel configuration you must work with the unique thread index within a block. As you wrote:

xid = ThreadIdx.x + blockDim.x * blockIdx.x // I get 0 a 1023 for threads in X axis

yid = ThreadIdx.y + blockDim.y * blockIdx.y // I get 0 a 1023 for threads in Y axis

You must keep in mind that each block has also an unique id. So, for the first block, with id (0,0) you will get the index:

xid = ThreadIdx.x + 16 * 0 // 0 to 15 in X axis

yid = ThreadIdx.y + 16 * 0 // 0 to 15 in Y axis

Your global unique id is:

gid = xid + 1024 * yid; // 0 to 1,048,576. Note that 1024 is the width of your array.

To sum up. You can get an unique global index in each block and each block has its own thread and block unique index you can use to generate the global index. Those index can be used to access the shared memory space.

Finally. Where’s thread ID 401,000?

xid = 8 + 16 * 38; // threadIdx.x + blockDim.x * blockIdx.x

yid = 7 + 16 * 24; // threadIdx.y + blockDim.y * blockIdx.y

gid = xid + 1024 * yid; // 401000

It’s in the block (38,24) using thread (8,7).

Hope this help.

Homework: how to get the block and threads index from 401,000 :)

Best regards!

Sorry about that. You are of course launching 1024x1024 threads. the formula I gave wsa wrong but the instead of id = tyid + txid * blockDim.y it should be

id = tyid + txid * (blockDim.ygridDim.y), with the same tyid and txid. The blockDim.x is 16 in your case while gridDim.y is 64, which give (blockDim.ygridDim.y)=1024 (same as the previous poster).