We’ve just had a wail of a time figuring out that the CUDA_PROFILE envvar disables streams, apparently. We found no hint to this in the documentation or on the forums.
To reproduce, run the simpleStreams SDK example, once with CUDA_PROFILE set to 1, and once with it set to 0 or unset. No need to install the Visual Profiler, just set this in the shell you launch from. Here’s an example run:
[codebox]
SDK2/bin/linux/release>./simpleStreams
running on: GeForce GTX 280
memcopy: 33.06
kernel: 17.20
non-streamed: 50.03 (50.26 expected)
4 streams: 33.79 (25.46 expected with compute capability 1.1 or later)
Test PASSED
Press ENTER to exit…
SDK2/bin/linux/release>setenv CUDA_PROFILE 1
SDK2/bin/linux/release>./simpleStreams
running on: GeForce GTX 280
memcopy: 33.07
kernel: 17.21
non-streamed: 50.04 (50.28 expected)
4 streams: 50.31 (25.48 expected with compute capability 1.1 or later)
Test PASSED
Press ENTER to exit…
[/codebox]
We tried both a couple of GTX 280s and C1060s, using driver versions 180.22, 180.29 and 185.??, Linux x86 and x86_64 (supported distros of course), with the 2.1 final toolkit.
Is this expected behaviour? I tend to doubt it, because the documentation explicitly talks about the stream ID (CUDA_profiler_2_1.txt in the doc/ directory). Any clue what we might be doing wrong?
Thanks,
dom