I am benchmarking the cufft code before incorporating it into a bigger project. My results show that an 8800GTX on a Mac actually outperforms a GTX280 on Linux. I have two GTX280s and get comparable results on them. My results are also in line (including the dip at log_2 N=9) of Naga Govindaraju (who unfortunately is not willing to release his code. Yes, I did get V. Volkov’s)
Can anybody explain why the 8800GTX is better than the GTX280 for larger FFTs? Memory bandwidth?
I was surprised.
Here are my results: