Stream serialization with CUDA Visual Profiler v2.3.11

Hi all,

Why does the Visual Profiler serialize my streams when I run “simpleStreams”? In contrast, when I run simpleStreams from the command line, even with CUDA_PROFILE=1 set, I get the appropriate speedups.

CentOS 5.3 (2.6.29), 2xOpteron 2.4GHz, 16GB of RAM, Tesla C1060

Command line:

[codebox][foo@bar C]$ CUDA_PROFILE=1 bin/linux/release/simpleStreams

[ simpleStreams ]

Device name : Tesla C1060

CUDA Capable SM 1.3 hardware with 30 multi-processors

scale_factor = 1.0000

array_size = 16777216

memcopy: 23.89

kernel: 25.12

non-streamed: 47.48 (49.01 expected)

4 streams: 26.43 (31.09 expected with compute capability 1.1 or later)

[/codebox]

Visual Profiler:

[codebox]=== Start profiling for session ‘Session1’ ===

Start program ‘/cuda/C/bin/linux/release/simpleStreams’ run #1

[ simpleStreams ]

Device name : Tesla C1060

CUDA Capable SM 1.3 hardware with 30 multi-processors

scale_factor = 1.0000

array_size = 16777216

memcopy: 23.92

kernel: 25.25

non-streamed: 47.55 (49.16 expected)

4 streams: 49.34 (31.23 expected with compute capability 1.1 or later)[/codebox]

Thoughts anyone? Is this a bug?

Cheers,

~Joe

No one wants to comment? NVIDIA Employees? :)

~Joe

Because the CUDA profiler serializes streams in order to get accurate timings.

So why does setting “CUDA_PROFILE=1” give me a profile.log without serialization? Or do you mean to say that the CUDA Visual Profiler purposefully serializes streams? How can I turn this off?

On a related note, how do I control what is measured in the cuda_profile.log? Is this the backend to the visual profiler or does it have special driver hooks that aren’t available on the command line?

Sorry for so many questions but I couldn’t find this info elsewhere.

Cheers!

Any CUDA developers lurk the forums? If so, I would love an answer! :)