I think the profiler is a great tool, that saves me a lot of manual performance measuring. But I now got some results that in my oppinion can’t be true.
timestamp=[ 2786.445 ] method=[ memcopy ] gputime=[ 207.776 ] timestamp=[ 3354.577 ] method=[ memcopy ] gputime=[ 207.712 ] timestamp=[ 3396.815 ] method=[ memSort ] gputime=[ 18.816 ] cputime=[ 35.050 ] occupancy=[ 0.667 ] timestamp=[ 4202.429 ] method=[ memcopy ] gputime=[ 187.040 ]
In general this shows a program where the first memcopy makes sure the data at a certain pointer is set to all zeros. The next memcopy copies (random) data from host to device. The memSort algorithm is a selfbuilt procedure which at the moment does nothing else than copying the (random) data to another memory position. The last memcopy copies the data from the new location back to the host for comparison.
Now the problem: If you look closely at the timings given by the profiler you might realize that memSort starts about 40us after the second memcopy and finishes 19us later while the memcopy (taking 208us) is still running. But memSort works on the data copied by the memcopy. The comparison back in the Host shows that the correct data was copied.
=> memSort must have been called after the data was successfully copied but the profiler shows an overlap of these calls
can anyone explain this behaviour to me? do I misinterpret any of the timings?