cudaThreadSynchronize() stalls?

The CUDA 1.1 release notes says:

----------8<------------------------
In order to reduce unwanted CPU utilization, the following APIs have
been modified to yield the CPU when the device is busy.

  • cuCtxSynchronize
  • cuEventSynchronize
  • cuStreamSynchronize
  • cudaThreadSynchronize
  • cudaEventSynchronize
  • cudaStreamSynchronize
    ----------8<------------------------

Does this mean that cudaThreadSynchronize() (and other synchronize calls) could return before the kernel execution terminates? :unsure:

No.

The modification to the synch calls was to add a thread yield to the busy wait loop. This is mentioned in a post by NVIDIA somewhere here, but I forget where.