GPU single and double precision FLOPs

hi, does anyone know how Nvidia gets the

Single Precision floating point performance (peak) 933
Double Precision floating point performance (peak) 78

for the Tesla C1060?

is there any available testing program to get these metrics?


The single precision FLOPS is computed by taking 3 [dual issue multiply-add and multiply] * 240 [stream processors] * 1.3 GHz [shader clock] = 936 GFLOPS. (not sure how they get exactly 933, maybe the clock isn’t 1.3 GHz, but rather 1.296 GHz)

The double precision FLOPS is computed by taking 2 [multiply-add] * 30 [# of multiprocessors] * 1.3 GHz [shader clock] = 78 GFLOPS.

Reaching these speeds would require a kernel that does no memory access and not even any index calculations.