CUDA_PROFILE disables streams

e.ping · April 6, 2009, 9:30pm

We’ve just had a wail of a time figuring out that the CUDA_PROFILE envvar disables streams, apparently. We found no hint to this in the documentation or on the forums.

To reproduce, run the simpleStreams SDK example, once with CUDA_PROFILE set to 1, and once with it set to 0 or unset. No need to install the Visual Profiler, just set this in the shell you launch from. Here’s an example run:

[codebox]

SDK2/bin/linux/release>./simpleStreams

running on: GeForce GTX 280

memcopy: 33.06

kernel: 17.20

non-streamed: 50.03 (50.26 expected)

4 streams: 33.79 (25.46 expected with compute capability 1.1 or later)

Test PASSED

Press ENTER to exit…

SDK2/bin/linux/release>setenv CUDA_PROFILE 1

SDK2/bin/linux/release>./simpleStreams

running on: GeForce GTX 280

memcopy: 33.07

kernel: 17.21

non-streamed: 50.04 (50.28 expected)

4 streams: 50.31 (25.48 expected with compute capability 1.1 or later)

Test PASSED

Press ENTER to exit…

[/codebox]

We tried both a couple of GTX 280s and C1060s, using driver versions 180.22, 180.29 and 185.??, Linux x86 and x86_64 (supported distros of course), with the 2.1 final toolkit.

Is this expected behaviour? I tend to doubt it, because the documentation explicitly talks about the stream ID (CUDA_profiler_2_1.txt in the doc/ directory). Any clue what we might be doing wrong?

Thanks,

dom

tmurray · April 6, 2009, 10:29pm

Pretty sure this was fixed in 2.2. In fact, let’s find out!

tmurray · April 6, 2009, 10:31pm

tim@thor:~/NVIDIA_CUDA_SDK/bin/linux/release$ ./simpleStreams 

running on: Tesla C1060

memcopy:	19.82

kernel:		25.15

non-streamed:	44.86 (44.98 expected)

4 streams:	27.37 (30.11 expected with compute capability 1.1 or later)

-------------------------------

Test PASSED

Press ENTER to exit...

tim@thor:~/NVIDIA_CUDA_SDK/bin/linux/release$ CUDA_PROFILE=1 ./simpleStreams 

running on: Tesla C1060

memcopy:	19.84

kernel:		25.13

non-streamed:	44.88 (44.97 expected)

4 streams:	28.01 (30.09 expected with compute capability 1.1 or later)

-------------------------------

Test PASSED

Press ENTER to exit...

so yes, this is fixed in 2.2

e.ping · April 6, 2009, 10:37pm

Now that’s what I call a fast turnaround time! Great, thanks :)

erlebach · April 7, 2009, 2:37am

Would it be possible to know of any outstanding errors in CUDA 2.2 that would minimize loss of time during development? thanks.

Gordon

tmurray · April 7, 2009, 4:04am

A lack of async support was just a limitation in the profiler that’s been fixed in 2.2. I’m not aware of any really obvious limitations anymore, although I bet CUDA_PROFILE still sets CPU affinity to a single core (to be able to do meaningful timestamps).

Topic		Replies	Views
Stream serialization with CUDA Visual Profiler v2.3.11 CUDA Programming and Performance	4	10119	November 3, 2009
Visual Profiler: tracking of concurrent data transfers and kernel executions CUDA Programming and Performance	2	552	January 20, 2011
streams strange behaviour with profiler CUDA Programming and Performance	0	532	November 25, 2014
profiler with cuda 2.2 CUDA Programming and Performance	12	5741	September 2, 2009
Problems with Streams Very strange!!! CUDA Programming and Performance	1	7648	November 26, 2009
C2050 simplestreams performance. CUDA Programming and Performance	1	5592	July 30, 2010
Updated beta visual profiler v0.2 CUDA Programming and Performance	0	2895	April 23, 2008
Kernel doesn't work without CUDA_PROFILE weird problem CUDA Programming and Performance	4	7782	January 15, 2008
cudaprof questions CUDA Programming and Performance	9	13494	February 5, 2009
Profiler not reading config file CUDA Programming and Performance	4	3398	March 19, 2009

CUDA_PROFILE disables streams

Related topics