Execute kernels without 100% CPU busy-wait?

Is it possible to execute kernels through OpenCL without incurring a busy wait on one CPU-core? If it is not possible through OpenCL, is it possible through CUDA directly?

I would like to keep feeding the GPU more work as the previous work is done, but preferably without running the CPU at 100%.

Of course that’s possible. In fact, that’s the default case, and you would have to do some work to make a (GPU) kernel execution block on the (CPU) host. If you look at the documentation of e.g. clEnqueueNDRangeKernel(), you’ll see that it immediately returns after “the kernel execution was [successfully] queued” (not “executed”). So executing a kernel is a non-blocking operation by default, as long as you don’t wait for the event associated with the kernel execution instance.

Sure, clEnqueueNDRangeKernel() returns. But there is a background thread in nvcuda.dll eating 100% CPU time.

Well, I don’t see this behavior on Vista x64, driver 270.61, GTX 285.

I get this behavior with GeForce GTX 580 with 270.61 drivers, on 64-bit Windows 7. Not only with my own program, but OpenCL programs written by others as well.

It’s very interesting to hear that it is not the same for everyone. There must be some kind of software or hardware issue causing it.

Busy wait loop is actually the default behavior under NVIDIA. Under CUDA you have an option to change the behavior into blocking synchronization or to wait on an interupt. The purpose of busy waiting is actually to get minimal latency in the responce. I don’t think that you can change the behavior with OpenCL though.

It seems to be possible in CUDA but not in OpenCL to avoid the “busy wait”:


Still I’m wondering why I don’t see this, or at least not to that extent. None of my CPU cores reaches 100%, at most 60%, but I’m also constantly doing CPU work, so that’s no surprise.

Thanks for the info, guys.

Did you paste the wrong link? It’s about sharing data between OpenCL and OpenGL.

If NVIDIA is listening, this would be high on my wish list: Having a way to choose between busy-wait and hardware interrupt. Perhaps an OpenCL extension could be used to expose this feature from CUDA into OpenCL?

Indeed, sorry. Here’s the correct one: http://www.khronos.org/message_boards/viewtopic.php?f=28&t=2794

any news on this one?

is nvidia going to implement a non-busy wait?


any news on this one?

is nvidia going to implement a non-busy wait?



Looks like this behavior driver version dependent.

With 267.24 we see very low CPU consumption in OpenCL app, but with 275.33 cpu time almost the same as elapsed, CPU used constantly…

It’s very undesirable cause we use CPU for computations too.

What kind of tool do you to view the cpu thread’s infomation? I tried gdb but it won’t let me interrupt when the kernel is running.