The program I am currently working on basically boils down to two sets of kernels that are called thousands of times each. I must pause between each iteration of each set to let the CPU do some work. Is there any way to avoid launching a new thread each time?
Each iteration of the kernel could be on the order of 2e-4 s. I don’t want thread management to swamp my speedup…
You can do exactly that, though there are many ways to do the same thing. On this forum its often referred to as a thread pool, where you have one thread managing the data flow to and from the gpu threads. If you ant a suggestion on how to do it, you could take a look at mutexes, which are used for thread sync. One guy did that at work, cause we found the thread launch overhead was hurting the real-time system. But I don’t know the best way to handle a thread pool wrt to gpu’s anyways, so you might need to look around.
The latter case is pretty much the design of MisterAnderson42’s GPUWorker class. It creates the threads, and performs the appropriate synchronization to start and stop them (without destroying them) as needed:
Doesn’t poster mean that he doesn’t want to launch the kernels a ton of times? Eg let the kernel sit idle on the device until it is launched again. If so, then no I don’t think that’s possible. If not so, and you’re talking about CPU threads then I’m wrong and the posts above mine should be read.