Missing some GFlops

allvano · December 3, 2007, 10:53am

Hello guys,

the G80 GTX has around 518 GFlops (GeForce 8 series - Wikipedia) which i know is a theoretical value due the arithmetic unit is shared between the 8 SP in a MP. But still if i use the follow calculation (128 SPs * 2 (op per second) * 1.35 Ghz) == 345 Gflops.

What happens with the (518 - 345) 173 GFlops ? Again, i know that this 518 is a theoretical value but 173 is really a lot. Is this the PTX Virtual Machine overhead ?

thanks,
jj

MisterAnderson42 · December 3, 2007, 2:55pm

The extra FLOPS come from the texture interpolaters (which are presumably implemented hardwired in silicon and not done on the multiprocessors). I’m not sure exactly how the math works out to count the extra GFLops from them though. Note that the CUDA programming guide only claims ~340 GFlops in figure1.

seibert · December 3, 2007, 3:45pm

I’m not sure if I understand you here, but the arithmetic units are not shared. Each multiprocessor has 8 ALUs (which NVIDIA calls “processors”), but one instruction decoder. So to use all 8 ALUs, you need them all to execute the same instruction (but of course, each ALU acts on different registers).

But yes, MisterAnderson’s explanation is correct. The marketing materials for the 8800 GTX include the computations done by the texture units in the total, which is highly optimistic unless you are doing nothing but texture math. The CUDA manual computes the Gflops based on just the ALUs (“processors”) being at full utilization.

allvano · December 4, 2007, 11:51am

Well, each MP contains 8 SP but only one instruction decoder. That is the reason why the same instruction is executed 8 times. But I’m not sure if the MADD and MUL units are included in each SP. I read somewhere that they are shared, but I’m not sure.

Anyway the most important is the information that per clock two arithmetic operations are possible, whenever not (a * B) * c , but more e.g. cos(a*B).

regards,

jj

Topic		Replies	Views
gigaflops CUDA Programming and Performance	16	16644	September 11, 2008
8800GTX:345GFlops or 518GFlops? CUDA Programming and Performance	8	9718	December 12, 2007
some detail-questions for a bachelor-thesis CUDA Programming and Performance	5	10500	December 4, 2010
# of multiprocessors still more silly stuff to ask CUDA Programming and Performance	5	16440	February 24, 2007
Question about computing GFLOPS Do fabs and a=-b instructions count? CUDA Programming and Performance	13	4683	February 12, 2010
flops calculation by profiler / of maximum CUDA Programming and Performance	6	14394	August 7, 2008
Theoretical FLOP speed Need clarification(s) CUDA Programming and Performance	8	28530	March 19, 2009
How to get more Gflops ? :) CUDA Programming and Performance	21	27842	September 12, 2008
Where do all the little FLOPS come from? still dont understand the spec CUDA Programming and Performance	8	18711	February 23, 2007
calculating thereticaly possible flops architecture differences G80/GT200/Fermi CUDA Programming and Performance	1	1908	April 26, 2010

Missing some GFlops

Related topics