Problem about launch kernel functions several times

The problem is:
When I launch kernel function one time, the program works well,
but if I launch this kernel several times by one loop, it can not work.
In kernel function, there is one loop up to 10,000 times.
The question are:
Why it can’t work when I launch kernel many times?
Is there anything else need to take care, if we launch many kernel functions in one loop?

Thanks in advance…

This depends on the kind of operations you’re doing. For instance, according to the Nvidia Programming Guide, a kernel return immediately, so, if you launch a kernel that use a portion of data and immediately launch another kernel that use the same portion of data, you’ll be in serious trouble. I think that a kernel don’t wait until the device becomes idle. There’s a function ( cudaThreadSynchronize() ) that promises to wait until any task running on device over.

I don’t think thats a problem… You can Que as many kernel as you want… and they should all run serially,… but they can be scheduled much faster .

As for your problem, am also experiencing a problem of similar nature like my kernel gets executed synchronously even though I don’t use thread sync command. I guess the kernel launching has some issues in cuda, as I have tried everything and I cant find a solution to my problem.

But maybe your algorithm is getting some segfault (ur kernel code mite access some array depending upon the some data value); if you run ur kernel multiple times \due to change of global memory at each kernel call ? (just a wild guess)

Thank you for your help.