CUDA vs TESLA GFlop rating

The CUDA documents talked of 340 GFlops observed. The new Tesla documents talk of 518 GFlops.

How were these benchmarks obtained? On what calculations?

Is it because the Tesla board is overclocked?

Marketing material includes the FLOPS you can get from the texture interpolation which inflates the number and makes it look bigger :) Even the 340 GFLOP number is inflated in my opinion, because it counts a MADD as two FLOPS. Depending on the calculations you are performing, not every operation will be a MADD which puts the actual theoretical peak GLFOPS somewhere between 170 and 340 depending on the ratio of MADD instructions used vs other floating point operations.

The Tesla GPU is identical to that in the high end Quadro cards.

It’s standard practice to count a multiply-add instruction as two flops. It is two floating-point operations after all, and we’re quoting peak performance here, not application performance.


I see on some forums that the reason the speed went up from 346 to 518 is that the new drivers can support in once clock cycle both a MADD and a MUL instruction (don’t know quite what computations that would be used in), and that for other computations without either a MADD or MUL the achievable speed will be about 1/3 of the new peak, or 1/2 the old peak.

Can you confirm?

[EDIT to add links where I read it]