MultiGPU performance

Ailleur · May 20, 2009, 5:12pm

Im working on porting an existing application to run on multigpus. It already works on a single gpu but i have a 9800gtx and a gtx280 in this pc at work so i might as well see if multigpu can help speeds things up.

Ive pretty much copied the multigpu example in the sdk. I get (way) worse performances though.

The code used to run on the gtx280 alone and the profiler reports 430ms. As a first try, ive split the work load in half between the gtx280 and 9800GTX.
The profiler now reports 1.5sec for half the work being done on the GTX280. Ive also tried to limit the number of GPUs being used to ‘1’, thus basicaly doing all the work on the GTX280 but still using host threads generations and the computation time logicaly raised to around 3 secs.

I dont get why this is much slower than a non cpu threaded application where all the work is done on the GTX280, any ideas?

Im using constant memory and textures defined in another file if this can cause problems… Populating both by each host thread.

eyalhir74 · May 20, 2009, 7:13pm

Im working on porting an existing application to run on multigpus. It already works on a single gpu but i have a 9800gtx and a gtx280 in this pc at work so i might as well see if multigpu can help speeds things up.

Ive pretty much copied the multigpu example in the sdk. I get (way) worse performances though.

The code used to run on the gtx280 alone and the profiler reports 430ms. As a first try, ive split the work load in half between the gtx280 and 9800GTX.

The profiler now reports 1.5sec for half the work being done on the GTX280. Ive also tried to limit the number of GPUs being used to ‘1’, thus basicaly doing all the work on the GTX280 but still using host threads generations and the computation time logicaly raised to around 3 secs.

I dont get why this is much slower than a non cpu threaded application where all the work is done on the GTX280, any ideas?

Im using constant memory and textures defined in another file if this can cause problems… Populating both by each host thread.

From my experience the first thing to do is check what takes you the most time. Use cutil timers, cuda events, hand clock or

any other mean to measure which part of your code takes the most of the time (setDevice, preparedata, copy data to device, kernel,

copy data from GPU). Once you know this, you’ll be able to understand better whats the problem

eyal

Sarnath · May 21, 2009, 5:03am

Strength of the chain is equal to the strength of the weakest link.

One of your cards takes 1.5 secs to complete half of the work and hence the output.

You need to load-balance correctly on a multi-GPU setup. Otherwise you wont get desired results.

It happened to us when using the personal-super computer with 4 TESLAs. The final output was as slow as the little “nView” card installed in that box.

Fixing that, resulted in sooper-dooper performance.

Charley · May 21, 2009, 12:48pm

First, have you seen concurrent bandwidth test?

[url=“The Official NVIDIA Forums | NVIDIA”]http://forums.nvidia.com/index.php?showtopic=86536[/url]

The bw to each card is less than the max bw when they are being used concurrently – something to keep in mind.

Also, are you creating and destroying threads over and over, or are you reusing threads? This can make a big difference.

Ailleur · May 21, 2009, 12:59pm

Thanks for the replies guys.

As was to be expected, my mistake had nothing to do with multigpu code! I was just forgetting to initialize one of the constant in the single gpu configuration, causing the whole kernel execution to bypass the costly branch in the code, thus ending much quicker.
My face is indeed quite red.

Now i just need another GTX280. The 9800GTX is slower at doing half the work than the GTX280 is at doing the whole work, so this is quite useless :)

Topic		Replies	Views
multiGPU poor performance up to 10x lowest performance in multiGPU CUDA Programming and Performance	14	10993	January 18, 2008
Multi GPU not working as expected - please comment CUDA Programming and Performance	10	38544	June 17, 2009
Performance with multiGPU ... and the 9800 GX2. CUDA Programming and Performance	4	8050	October 22, 2008
MultiGPU information CUDA Programming and Performance	3	2439	June 8, 2009
Why multi-GPU does not work better? CUDA Programming and Performance	2	887	July 24, 2015
Weird multiGPU performance About 10 times slower than single GPU CUDA Programming and Performance	10	4081	November 25, 2009
One GPU of four running slowly? CUDA Programming and Performance	4	2203	March 26, 2009
Poor performance with dual GPU 10x slower? CUDA Programming and Performance	10	6019	August 15, 2007
Cuda multithreading and stream problems generic system issues CUDA Programming and Performance	4	3462	August 15, 2008
Profiling multi-gpu code CUDA Programming and Performance	1	742	January 7, 2015

MultiGPU performance

Related topics