I am trying to use a TAU (http://tau.uoregon.edu) on a C++ application (built on Magma - https://icl.cs.utk.edu/magma/) that uses multiple threads to submit kernels to multiple streams. TAU uses the CUPTI Activity API to trace events on the GPU. After 2 weeks of debugging, I have determined that the CUPTI Activity API is returning events with bogus(?) timestamps for this code.
I created a simple tracing tool using the code in the activity_trace_async example. I basically took the cpp file from that example and built a shared object library out of it, and added a static global constructor so that I could LD_PRELOAD the library and run the application I want to trace. The library can be found here: https://github.com/khuck/CUPTI_examples/blob/master/cuda_9_2_samples/activity_trace_async
At any rate, after some time of running, the application spits out the following kinds of events… You can see that the correlation ID is monotonically increasing, but the timestamp has somehow gone backwards in time:
MEMCPY HtoD [ 31865754446 - 31865755630 ] device 0, context 1, stream 33, correlation 287972/r0
KERNEL “” [ 31865770254 - 31865774158 ] device 0, context 1, stream 33, correlation 287981
grid [1,1,1], block [16,16,1], shared memory (static 0, dynamic 0)
MEMCPY DtoH [ 28817954356 - 28817959060 ] device 0, context 1, stream 33, correlation 287982/r0
The timestamps continue monotonically increasing from that point on…until at some point in the future that they jump back again. This seems to happen on all(?) streams.
Is this a bug in CUPTI? Is something else going on? I see ‘backwards’ timers of various amounts, from a few seconds to up to 20+ seconds difference.
nvprof has no problem generating a valid .nvvp file for this code, so there must be a CUPTI trick/setting that we are missing (and is also missing from your activity_trace_async example)…?