cudaEventElapsedTime with multiple devices

cudaEventElapsedTime doesn’t work when the events are from different devices. This is probably not a fundamental limitation, because nSight can create a timeline of the whole application, with all of the devices it used. So is there a good way of comparing the time difference between two events on different devices programmatically (not in nSight)?

(I tried some hackish solutions like having one device wait for an event on another, and then registering its own event, so that I’d have two events occurring in close proximity, but the results seem poor)

Indeed, events are specific to devices:

"cudaEventElapsedTime() will fail if the two input events are associated to different devices. "

You could try inserting a callback to a routine containing a host-based time-stamping function (e.g. gettimeofday):

at each point where you want timing measurements. Later, you can subtract the two timestamps.

Note that the callback is executed at the point when all previous cuda activity in that stream has executed, so it behaves similarly to event (completion). Separate devices (should) have separate streams:

Note that callbacks should not make use of any CUDA API functionality.

Thanks, Txbob!

(BTW, when I wrote that one approach I tried seemed to produce poor results, I didn’t realize that there was a bug in my code stemming from misunderstanding the semantics of cudaStreamWaitEvent)