Scalability question

simonboots · June 6, 2009, 2:28pm

Hi everybody,

I’ve got two NVIDIA GPUs:

GeForce 8600M GT (4 multiprocessors, 0.75GHz clock rate)
GeForce 8800 GT (14 multiprocessors, 1.5GHz clock rate)

With these specs I’d expect a kernel with enough thread blocks to run about 7 times faster (twice the clock rate, 3.5 times more multiprocessors) on the 8800 GT than on the 8600M GT.
Tests show me that it “only” runs about 5 times faster.
I’m taking the raw execution times of the kernel without any memcpy() operations.

Any guesses why?

Thanks a lot!

Simon

seibert · June 6, 2009, 2:49pm

Depending on the fraction of memory reads/writes compared to arithmetic instructions in your kernel, you might be partially memory bandwidth bound. What’s the ratio of memory bandwidths for the two GPUs? Your speedup is generally between this bandwidth ratio, and the floating point performance ratio you already calculated.

simonboots · June 6, 2009, 3:09pm

Well, according to Wikipedia the memory bandwidth is 57.6GB/s (256 bit bus width) for the 8800 GT and 22.4GB/s (128 bit bus width) for the 8600M GT.

The kernel has indeed a lot of memory operations compared to arithmetic instructions.

Thanks! That makes sense!

cvnguyen · June 6, 2009, 4:15pm

If the number of blocks in your kernel calls is not much larger than 14, forget about that theoretical speed-up.

Topic		Replies	Views
Kernel on GT 740 run slower than GT 430 CUDA Programming and Performance	1	906	August 13, 2015
Multiprocessors in Geforce 8600 GT CUDA Programming and Performance	6	8556	December 22, 2008
Is it possible to estimate the performance ? 8500GT (current) -> 9800x (or GT200) = ? CUDA Programming and Performance	4	3661	June 13, 2008
Performance evaluation CUDA Programming and Performance	5	10071	August 20, 2008
9800 GTX bandwidthTest? CUDA Programming and Performance	5	7397	April 2, 2008
Speed-ups for Reduction CUDA Programming and Performance	2	1572	October 14, 2008
Scalability and performance CUDA Programming and Performance	2	4430	February 11, 2010
Observation about performance change with change in grid size CUDA Programming and Performance	0	1446	May 19, 2009
Less GDDR2 X More GDDR3 Performance and Useability in CUDA. CUDA Programming and Performance	6	4643	August 17, 2008
Device Memory Bandwidth CUDA Programming and Performance	8	1846	March 29, 2015

Scalability question

Related topics