execution of two kernels as one


suppose i have two kernel functions k1 and k2. The normal operation would

be to start k1 with N1 blocks, get the result to the host and start k2 with

N2 blocks afterwards.

I’m looking for a method to execute the kernels as one. My idea is:

[codebox]device int finished = 0;

global kernel()


if (blockId.x < N1)



        if (treadId.x == 0)

            atomicAdd(&finished, 1);


        return ;


// wait until k1 is done

  while (finished < N1) { } ;




Will this work in general ?

Greetings, Uwe

In general I would say that a __syncthreads() inside a conditionally executed blockis a big no-no.

But in this case it appears that all threads of a block would reach the __syncthreads() statement.

So I continued to look for more trouble in the code - and I found some.

I see an issue with your code in that a block cannot terminate unless the condition (finished == N1)

is reached. This means no new blocks will be scheduled in hardware. On a device with 12 multiprocessors

only 12 blocks will launch and then you will deadlock.

So my take is: no, it won’t work.


Will it really deadlock ? I hoped that work would go on after finishing the 12 blocks.

Is thery any other to achieve my task ???

Greetings, Uwe

Yes, it really will deadlock. Try it if you don’t believe us. Note that you may need more than num_multiprocessors blocks to hit the deadlock as up to 8 blocks can run on a single multiproc (the actual value depends on shared mem and register usage).

Sure, just call the two kernels back to back. The overhead is only ~10 microseconds for a kernel launch. Even if they did work all those atomicInc’s and spin waiting would probably amount to more time than that wasted.

Well, that is much faster than I suspected.

Greetings, Uwe