cudaDeviceScheduleSpin with OpenCL How to let OpenCL actively spin for return of kernel

Hi board,

I’ve implemented various algorithms in CUDA and OpenCL (PTX looks pretty much the same), but have noticed that the overhead for a single OpenCL call is larger than for CUDA. The next thing I noticed was that CUDA was using 100% CPU during the whole execution of host and kernel code, whereas OpenCL let the CPU idle.

I stumbled over cudaDeviceScheduleSpin and understood that CUDA is using one of my CPU-cores to actively spin, waiting for the result. The big question now is:

How do I make OpenCL spin too?

Documentation for CUDA here: http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/docs/online/group__CUDART__DEVICE_g18074e885b4d89f5a0fe1beab589e0c8.html#g18074e885b4d89f5a0fe1beab589e0c8