cudaThreadSyncronize doubt

Imagine I create an independent CPU thread for each CUDA device and each CPU thread is assigned using cudaSetDevice(). Is the call to cudaThreadSyncronize() used in the CU_SAFE_CALL macro really needed then?


A call do cudaThreadSynchronize() is never needed unless you want to perform wall-clock timing measurements.

… or asynchronious device to host memory copies. However in any case events can be used as well.