SimpleStreams and Asyncmemory copies are slow

crroush · January 29, 2009, 9:44pm

I am seeing a major performance hit when doing asynchronous memcpy, this is easily seen running the simpleStreams program, it is like they are running serially and not in parallel.

OS: Fedora Core 8

MB:asus P5N-T 780i Motherboard.

GPUs:9800 GT, Tesla 1060.

CUDA: 2.0

Driver: 180.22

Output from simpleStreams:

running on: Tesla C1060

memcopy: 48.44

kernel: 53.73

non-streamed: 45.35 (102.16 expected)

4 streams: 266.74 (65.84 expected with compute capability 1.1 or later)

Test PASSED

Any ideas as to what I can try to resolve this?

Charley · January 30, 2009, 1:00pm

For what its worth, this is a similar thread:

[url=“http://forums.nvidia.com/index.php?showtopic=86915&pid=492314&mode=threaded&start=#entry492314”]http://forums.nvidia.com/index.php?showtop...rt=#entry492314[/url]

crroush · January 30, 2009, 3:36pm

Thanks for the link, not sure it helps though =(.

If I was that close to the expected value, I wouldn’t complain so much, but a comparison on my laptop which has a Geforce 8700m GT,

[b]memcopy: 42.67

kernel: 54.50

non-streamed: 123.37 (97.17 expected)

4 streams: 62.76 (65.17 expected with compute capability 1.1 or later)

[/b]

I would expect something similar to the expected results, and I certainly wouldn’t expect the non-streamed version to out perform the multi-stream version.

crroush · January 30, 2009, 4:21pm

We have determined the cause of this problem, we have a custom PCIe card that is some how causing problems in the PCIe switch which is causing our problem. After removing the card, we get transfer rates that we expect:

[b]running on: Tesla C1060

memcopy: 20.31

kernel: 25.14

non-streamed: 45.33 (45.45 expected)

4 streams: 28.14 (30.22 expected with compute capability 1.1 or later)

Test PASSED[/b]

thesquiff · February 12, 2009, 2:42pm

I’ve also encountered a similar problem.

I’ve been running code on a GTX280 and using a Quadro FX570 to run the computer display. In this configuration the streamed code was vastly slower.

However once I switched my desktop back to the GTX280, the streamed version ran as expected.

Anyone like to offer a more detailed explanation? Surely this needs a fix.

Topic		Replies	Views
SimpleStream sample unexpected results CUDA Programming and Performance	2	1475	February 10, 2010
simpleStreams sample shows almost no speed up Is it expected ? CUDA Programming and Performance	11	2307	October 16, 2008
Streaming issue? SimpleStream results not as expected. CUDA Programming and Performance	1	4473	October 9, 2009
C2050 simplestreams performance. CUDA Programming and Performance	1	5584	July 30, 2010
simpleStreams FAILED CUDA Programming and Performance	0	816	November 15, 2011
Stream serialization with CUDA Visual Profiler v2.3.11 CUDA Programming and Performance	4	10099	November 3, 2009
CUDA_PROFILE disables streams CUDA Programming and Performance	5	8005	April 7, 2009
stream execution and smem usage CUDA Programming and Performance	0	972	November 9, 2008
non-streamed and 4 streamed much lower than expected CUDA Programming and Performance	0	844	June 18, 2009
non-streamed and 4 streamed much lower than expected CUDA Programming and Performance	0	3092	June 18, 2009

SimpleStreams and Asyncmemory copies are slow

Related topics