kernal execution How is the kernal function executed?


Inside a kernal function, suppose we have the following code:

// Block Index

int bx = blockIdx.x;

int by = blockIdx.y;

// Thread Index

int tx = threadIdx.x;

int ty = threadIdx.y;

So, when the kernal is called, is it executed parallelly by all the threads so that the above variables have different values for different threads?

Or is it executed by all the blocks in parallel??

I’m not clear on how the kernal is executed and what the assignments blockIdx.x, threadIdx.x…etc mean?

Can I please get help on this? :mellow:


blockIdx and threadIdx are variables that are initialized automatically by cuda with the blockindex and threadindex.
So if you call your kernel with:
you have blockIdx.x from 0 to 15.
Every block now has threads from 0 to 63.
Based on the indices you can schedule and control the work of the single threads.

So the lines of code in your kernel are executed by each and every thread in parallel.

That and more is covered pretty good in the programming guide!


Appreciate you response :)