CUDA 2.0 and Visual Profiler 1.0.11 cudaprof reports are missing memcopy's

waswas · August 25, 2008, 1:44pm

On a Linux node installed with a GeForce 8600, CUDA 1.0 and Visual Profiler 1.0 (Alpha January 2008), the Visual Profiler reports included both “memcopy” and kernel timings for all our benchmarks, as expected.

Then, we moved the GeForce 8600 into a new Linux node installed with CUDA 2.0 and Visual Profiler 1.0.11. And, upgraded the old Linux node to CUDA 2.0, leaving the Visual Profiler 1.0 (Alpha January 2008), and installed a Tesla C870 into this old CUDA upgraded node.

Now, the Visual Profiler 1.0 (Alpha January 2008) reports for the Tesla runs include the memcopy and kernel timings. But, the Visual Profiler 1.0.11 reports for the GeForce 8600 runs include only the kernel timings, and are missing “memcopy” timings.

Our benchmarks are (were) passing on GeForce 8600 for both CUDA 2.0 (and 1.0), and are passing on the Tesla C870.

Did we install the proper Visual Profiler (version 1.0.11) for CUDA 2.0?

theMarix · August 25, 2008, 3:28pm

Just out of curiousity. Is it missing all memcpy, or is just the count of memcpies lower than the actual number of memcpies done?

waswas · August 25, 2008, 3:55pm

It’s missing all memcopy’s results whether the host PC logic employs cudaMemcpy() or cudaMemcpyAsync().

aakova · August 25, 2008, 10:37pm

I’ve seen this behavior as well; it gets the first three, but misses the subsequent 2000 following the first kernel invocation.

Reimar · August 26, 2008, 6:38am

Interesting, it works fine for me, have you tried enabling all profiler counters in session settings->configuration? (it needs 3 passes then) I know that e.g. not enabling “time stamp” has had several weird side effects for me (e.g. CPU time is still displayed but with nonsensical values).

waswas · August 26, 2008, 12:00pm

This morning, I tried enabling the profile counters as well as the timestamp, and still no memcopy results with the cudaprof (Version 1.0.11).

I neglected to mention that these benchmarks are streaming benchmarks that employ cudaStream_t objects for concurrent PC/GPU global memory I/O and kernel function executions. I varied the number of cudaStream_t objects from 1 to 8, and cudaprof failed to report any memcopy results.

Yesterday, I executed other non-streaming benchmarks on the GeForce 8600. Here, cudaprof reports all the memcopy results as expected.

The working cudaprof (Alpa Version 1.0 January 2008) profiles benchmarks on the Tesla C870, which does not support asynchronous PC/GPU I/O with kernel executions, so all the streaming benchmarks degrade to serial and synchronous PC/GPU global memory I/O and kernel executions, one stream at a time.

After our IT administrator installs a copy of the older cudaprof on the GeForce node, I will attempt profiling the streaming benchmarks on the GeForce 8600, and see what I get…

Topic		Replies	Views
profiler not logging memcopies under 2.1? CUDA Programming and Performance	1	2892	February 16, 2009
CUDA_PROFILE disables streams CUDA Programming and Performance	5	8005	April 7, 2009
CUDA Profiler [memcopy] weird result CUDA Programming and Performance	7	7257	November 8, 2007
visual profiler with compute capability 1.0 cards? CUDA Programming and Performance	9	5200	September 12, 2008
CUDA Visual Profiler Warning (dropped rows) CUDA Programming and Performance	10	21615	October 7, 2011
cuda visual profiler CUDA Programming and Performance	12	8167	July 30, 2008
Visual profiler missing information CUDA Programming and Performance	6	8409	May 26, 2009
NVProf error on samples CUDA Programming and Performance	28	20425	December 29, 2020
Visual Profilter Timestamp problem CUDA Programming and Performance	3	15935	September 10, 2009
Profiling CUDA Programming and Performance	2	826	August 17, 2015

CUDA 2.0 and Visual Profiler 1.0.11 cudaprof reports are missing memcopy's

Related topics