The CUDA 1.1 release notes says:
----------8<------------------------
In order to reduce unwanted CPU utilization, the following APIs have
been modified to yield the CPU when the device is busy.
- cuCtxSynchronize
- cuEventSynchronize
- cuStreamSynchronize
- cudaThreadSynchronize
- cudaEventSynchronize
- cudaStreamSynchronize
----------8<------------------------
Does this mean that cudaThreadSynchronize() (and other synchronize calls) could return before the kernel execution terminates? :unsure: