Hi all,
Why does the Visual Profiler serialize my streams when I run “simpleStreams”? In contrast, when I run simpleStreams from the command line, even with CUDA_PROFILE=1 set, I get the appropriate speedups.
CentOS 5.3 (2.6.29), 2xOpteron 2.4GHz, 16GB of RAM, Tesla C1060
Command line:
[codebox][foo@bar C]$ CUDA_PROFILE=1 bin/linux/release/simpleStreams
[ simpleStreams ]
Device name : Tesla C1060
CUDA Capable SM 1.3 hardware with 30 multi-processors
scale_factor = 1.0000
array_size = 16777216
memcopy: 23.89
kernel: 25.12
non-streamed: 47.48 (49.01 expected)
4 streams: 26.43 (31.09 expected with compute capability 1.1 or later)
[/codebox]
Visual Profiler:
[codebox]=== Start profiling for session ‘Session1’ ===
Start program ‘/cuda/C/bin/linux/release/simpleStreams’ run #1 …
[ simpleStreams ]
Device name : Tesla C1060
CUDA Capable SM 1.3 hardware with 30 multi-processors
scale_factor = 1.0000
array_size = 16777216
memcopy: 23.92
kernel: 25.25
non-streamed: 47.55 (49.16 expected)
4 streams: 49.34 (31.23 expected with compute capability 1.1 or later)[/codebox]
Thoughts anyone? Is this a bug?
Cheers,
~Joe