how does blocks use threads?

Ok i must admit, im a bit frustrated, cant get this whole thing, and im circling around for definitly to long time.

My question is: how does blocks divide its task for the threads ? Ok i understand this is very confusing questiob so here is an example.

Normaly if i run my cernel liek this:

kernell<<<100,100>>(numThreads);

__global__ kernel(numThreads);

{

int bid = blockIdx.x;

int tid = blockIdx.x;

if(tid<numThreads)

...

}

OK so here, there is 100 blocks, each with 100 threads. Each block execute threads, until index is out of boundries. Tere is no loops like for, so threads should work simuletanously (in theory), without any dependencies ect. This is a normal case.

Ok so how about this?

kernell<<<100,100>>(numThreads);

__global__ kernel(numThreads);

{

int bid = blockIdx.x;

int tid = blockIdx.x;

for(int i=0; i<tid; i++)

    var[i] = 1; //i know silly example but...

}

Now, how about my threads? Is this whole loop will be procesed by only one tread sequencionaly ? Does threads work here at all ? There is no kind of dependencies in this loop, so will each element be procesed by one thread?

I just cant get a straight answer for this.

Hi Naiilo - I know some of this is pseudo code, so I won’t comment on that. One issue that I should mention though is that tid in your example would be for the Thread ID in the X dimension, so in your code, tid would be assigned to threadIdx.x (not blockIdx.x). However, to answer your questions, each thread in the thread block would execute the code you have listed. The var array is not defined, but each thread would have an instance of this array and each thread would iterate on your loop (which would be variable, according to the thread ID). For instance, thread id 0 would iteration 0 times, thread id 1 would iterate 1 times, etc. In contrast, assigning the bid variable to the exit condition of the for loop would cause each thread to iterate the same number of times. Hope that helps.