On a Linux node installed with a GeForce 8600, CUDA 1.0 and Visual Profiler 1.0 (Alpha January 2008), the Visual Profiler reports included both “memcopy” and kernel timings for all our benchmarks, as expected.
Then, we moved the GeForce 8600 into a new Linux node installed with CUDA 2.0 and Visual Profiler 1.0.11. And, upgraded the old Linux node to CUDA 2.0, leaving the Visual Profiler 1.0 (Alpha January 2008), and installed a Tesla C870 into this old CUDA upgraded node.
Now, the Visual Profiler 1.0 (Alpha January 2008) reports for the Tesla runs include the memcopy and kernel timings. But, the Visual Profiler 1.0.11 reports for the GeForce 8600 runs include only the kernel timings, and are missing “memcopy” timings.
Our benchmarks are (were) passing on GeForce 8600 for both CUDA 2.0 (and 1.0), and are passing on the Tesla C870.
Did we install the proper Visual Profiler (version 1.0.11) for CUDA 2.0?