Tokyo Tech said their TSUBAME supercomputer had gotten a theoretical peak performance of nearly 170 Tflops by adding 170 Tesla S1070 systems, but there should be 680 Tflops of theoretical peak performance since 1 Tesla S1070 delivers 4 Tflops of performance! What wrong with these number? Can anyone give a reasonable explanation?
Linpack is a double precision benchmark and the peak double precision performance of the 100 series GPUs is one eighth of the headline single precision performance most people like to quote (each multiprocessor has eight single precision stream processors but only one double precision stream processor).