I wrote an interface code to benchmark CUFFT on GTX8800 using benchFFT 3.1. I got peak “benchFFT GFlops” around 34-35, when doing 2D FFT on a 1024x1024 2D array.
But according to the following document, slide 27:
It says G80 can achieve 52 benchFFT GFLOPS. How is this 52 GFlops derived?
(My code exclude the data transfer between host and device memory from being counted into the performance measurement. So only the computation time on G80 is measured.)
Many thanks in advance!!