When i launch a kernel i let the host process sleep, so that it does not waste cycles in
the cudaThreadSynchronize spinlock. My problem is, no matter how long i let the cpu sleep,
cudaThreadSynchronize takes the same amount of time it would take had i not slept at all.
What is the method to pause the host process during kernel execution? On linux i use
sleep and everything is fine.