More cores than GPUs

bbales2 · June 2, 2009, 6:15pm

Could someone give me any ideas about how bad it is to have many CPUs independently sending requests to one GPU?

I saw a thread around here where someone mentioned it was not recommended, but another guy responded and said his less demanding application worked fine in this situation.

Is this something that becomes very significant very quickly? I’m assuming we’re going to have to have our memory coalescing/shared memory problems out of the way before we even worry about these program switches, or is this a bad thing to assume?

Does this work at 2 cores/GPU where it doesn’t work at 8 cores/GPU?

Ben

gatoatigrado · June 2, 2009, 6:25pm

check out MisterAnderson42’s gpu worker
[url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA

It depends if the CPUs are doing actual computation or just dispatching kernels. For the latter, you probably won’t see much benefit (maybe even degradation) over 1 cpu core to 1 gpu.

seibert · June 2, 2009, 6:32pm

Yeah, that was probably me. I’ve since learned that, at least for now, you should try not to have separate processes using the same GPU, or you might trip over some driver bugs that will be fixed in a future CUDA release at some point. However, you could easily have one thread controlling the GPU and several CPU threads requesting GPU work to be done by the GPU-controlling thread.

tachyon_john · June 2, 2009, 7:06pm

Hi, you’ll likely benefit from reading Jim’s presentations on this topic as it pertains to our code NAMD:
[url=“http://www.ks.uiuc.edu/Research/gpu/files/2009-01-GPU-Phillips.pdf”]http://www.ks.uiuc.edu/Research/gpu/files/...PU-Phillips.pdf[/url]

bbales2 · June 2, 2009, 8:12pm

Tach: On Page 17, would it be true to say that the black bars overall have more GPUs than the dark or light grey ones (while all bars have the same number of cores running – I suppose you just left some cores on each node idle)?

Also (and this may be difficult for you to even guess at) but how intensive are your GPU calls? I mean intensive in the sense of a matrix multiplication being intense, so that if two cpus were requesting MMs on an ideal GPU you’d get 2x longer runtime (and any degradation past that would be effects of sharing the GPU). At some point of lower intensity, you’d not get a performance drop because the GPU would have plenty of free time. The paper says you guys have some conditionals that break up the multiprocessors a bit. On a scale of 0 → 10, what would you say your kernels are like?

How big are you kernels (per call)? It looks from earlier slides that the GPUs are running a tenth of a second/step, but is that just a few calls? Or is that a few hundred calls?

I think I’m asking worthwhile questions here… Tell me if I’m barking up the wrong tree (or tell me if I need to read the paper more closely :]).

Gatoat & Seibert: I fear this, but yeah, I see the argument for setting up this way. I’ll probably try overloading the GPUs first anyway :).

Ben

Topic		Replies	Views
GPU cores vs CPU cores? CUDA Programming and Performance	6	9604	April 26, 2010
CPU cores vs GPUs CUDA Programming and Performance	6	9843	March 18, 2009
CUDA perormances CUDA Programming and Performance	10	7129	January 22, 2008
Using GPUs on high performance machines CUDA Programming and Performance	4	1063	February 8, 2013
Using <<<...>>> CUDA Programming and Performance	6	2478	June 19, 2011
Performance gap for a short test code between GPU and CPU CUDA Programming and Performance	8	1861	October 26, 2017
My first test on CUDA and some questions sync, thread with CUDA CUDA Programming and Performance	5	3024	November 13, 2007
Multiple CPU threads Performance hit CUDA Programming and Performance	5	5381	February 28, 2008
Is this a good match for GPU? CUDA Programming and Performance	5	3613	June 11, 2009
Mapping between CUDA cores and threads CUDA Programming and Performance	7	15404	December 2, 2011

More cores than GPUs

Related topics