*The* solution to wait for the GPU? What's the best variant?


as I’ve read much here about CPU burning and “hacks” to prevent it, I wanted to ask if there is now any really good technique to prevent CPU burning while waiting for the kernel?

I tried cudaEventRecord together with cudaEventSyncronize, but my CPU still seems to be burned.

Any chance that this will be fixed in a future release of CUDA?

At the moment, my application is nearly unusable as I need low-latency because I’m processing audio streams. I can introduce latency, but I don’t want that of course…


One idea: if your kernel runs for a long time (more than few milliseconds) and this time is (almost) constant then you can measure this time once (at the beginning of program execution) and then insert sleep() for little less than measured amount of time after kernel invocation but before cudaThreadSynchronize(). This can save a lot of CPU cycles if your kernel runs for a second or more.

Of course, because cudaEventSynchronize has the same spin-wait as cudaThreadSynchronize and the implicit syncs.

The only method I am aware of to solve this is to use cudaEventQuery in a while loop with a short sleep inside (i.e. nanosleep).