reliability of profiler output data profiler output data unplausible?

I think the profiler is a great tool, that saves me a lot of manual performance measuring. But I now got some results that in my oppinion can’t be true.

timestamp=[ 2786.445 ] method=[ memcopy ] gputime=[ 207.776 ]

timestamp=[ 3354.577 ] method=[ memcopy ] gputime=[ 207.712 ]

timestamp=[ 3396.815 ] method=[ memSort ] gputime=[ 18.816 ] cputime=[ 35.050 ] occupancy=[ 0.667 ]

timestamp=[ 4202.429 ] method=[ memcopy ] gputime=[ 187.040 ]

In general this shows a program where the first memcopy makes sure the data at a certain pointer is set to all zeros. The next memcopy copies (random) data from host to device. The memSort algorithm is a selfbuilt procedure which at the moment does nothing else than copying the (random) data to another memory position. The last memcopy copies the data from the new location back to the host for comparison.

Now the problem: If you look closely at the timings given by the profiler you might realize that memSort starts about 40us after the second memcopy and finishes 19us later while the memcopy (taking 208us) is still running. But memSort works on the data copied by the memcopy. The comparison back in the Host shows that the correct data was copied.

=> memSort must have been called after the data was successfully copied but the profiler shows an overlap of these calls

can anyone explain this behaviour to me? do I misinterpret any of the timings?

I’ve found with my own profiling on Windows (QueryPerformanceCounter) that since kernel calls are asynchronous, sometimes the kernel function returns quickly, but the subsequent memcpy takes longer. Presumably the memcpy waits for the previous kernel invocation to finish before starting.

Vrahok, thanks for posting this. :thumbup: I had noticed this too and had planned to post a similar message.
My guess is that the time-stamps only tell us when the call was made and not necessarily started.
I dont mind being wrong, in fact would definitely like the correct answer.


I sent a report to nvidia and this is actually a bug in the timestamp calculation on winXP 32-bit systems. I got the response that nvidia is working on the fix.