I ran the visual profiler (both 1.0 and 1.1) on my application, however, it doesn’t show all the memory copy instructions. I noticed it only shows the memory copy from the device to the host only. Is this an intended feature?
Additionally, in contrast to the consensus here, the CPU time is less than the GPU time. Its pretty much consistant (8-12 us). Can someone explain why this is the case?