Is it possible to execute kernels through OpenCL without incurring a busy wait on one CPU-core? If it is not possible through OpenCL, is it possible through CUDA directly?
I would like to keep feeding the GPU more work as the previous work is done, but preferably without running the CPU at 100%.
Of course that’s possible. In fact, that’s the default case, and you would have to do some work to make a (GPU) kernel execution block on the (CPU) host. If you look at the documentation of e.g. clEnqueueNDRangeKernel(), you’ll see that it immediately returns after “the kernel execution was [successfully] queued” (not “executed”). So executing a kernel is a non-blocking operation by default, as long as you don’t wait for the event associated with the kernel execution instance.
I get this behavior with GeForce GTX 580 with 270.61 drivers, on 64-bit Windows 7. Not only with my own program, but OpenCL programs written by others as well.
It’s very interesting to hear that it is not the same for everyone. There must be some kind of software or hardware issue causing it.
Busy wait loop is actually the default behavior under NVIDIA. Under CUDA you have an option to change the behavior into blocking synchronization or to wait on an interupt. The purpose of busy waiting is actually to get minimal latency in the responce. I don’t think that you can change the behavior with OpenCL though.
Still I’m wondering why I don’t see this, or at least not to that extent. None of my CPU cores reaches 100%, at most 60%, but I’m also constantly doing CPU work, so that’s no surprise.
Did you paste the wrong link? It’s about sharing data between OpenCL and OpenGL.
If NVIDIA is listening, this would be high on my wish list: Having a way to choose between busy-wait and hardware interrupt. Perhaps an OpenCL extension could be used to expose this feature from CUDA into OpenCL?