It depends I guess. If you have a CPU multi threaded program, cudaThreadSynchronize will do nothing synchronization wise so you would have to additionally include your own barrier. Other than that I guess it would work, although the phrase killing a fly with a bazooka comes to mind. (given the additional overhead of having to synchronize the cudaEvents’ timings).