Threads status before launch of new kernel function

Hi All,

I am newbie to CUDA. I have one doubt. When we call a kernel function in a loop (just for some iterations only), is it necessary that all the threads launched in the previous call will complete their tasks before call to kernel in the next iteration ??

Waiting for reply.


If there are dependancies ( the second function needing output from the first function)

Otherwise you can assume that threads are initialised at start of new kernel, (similarly whenever a mulitprocessor starts to process an new block.)


Thanks for your reply :-)

In my case the data flow is like this " Global Memory (I)----> Shared Memory(II)------>Global Memory (III)

The threads do some manipulation in the data available in the shared memory and store it back to global memory. Now the data available in global memory is used by the threads launched in the new kernel call. I have used __synthreads() as the end statement in the kernel function( which may ensure completion of work by all the threads) but I m getting wrong output after 18 iterations… Is it due to improper synchronization of threads or something else ?

Waiting for your reply.