Some background:

it is stated that k20x has 2668 cores. It is also shown that there are 15 stream processors (smx), and that each smx has 192 single-precision cores. However, multiplying 192 x 15 does not yield 2668 cores. Am I missing something here?

Processor

Flops: 1.3 TFLOPS

Cores: 2688

Clock: 732 MHz

Stream Processors (SMX)

15 SMXs

192 SP CUDA Cores*, 64 DP units per SMX

4 Warp Schedulers per SMX

32 threads/Warp

4 Warp Schedules/cycle; 2 Dispatches/schedule

Intr issue: 4 x 2 = 8 intr / cycle (http://tinyurl.com/c3drc6d page 8)

MY CALCULATION of Compuation Density (CD):

CD = f x N x #instr per cycle

CD = freq x (# of cores per SMX x # of SMX) x #instr per cycle

CD_sp = 0.732 GHz x (192 x 15 ) x (8 x 32 bus bandwidth / 32 bit precision) = 8432 GOPS ???

CD_dp = 0.732 GHz x (64 x 15 ) x (8 x 32 bus bandwidth / 64 bit precision) = 2811 GOPS ???

Results obtained dont tally with Nvidia’s reported FLOP. How did they get the numbers? Help needed!