time per FFT problem

Hi all,

I need to benchmark the cufft performance. I find that if I do only one 1024FFT, the time is around 70us, but if I loop the FFT for 100 or 1000 times, the average time per FFT is around 10-12 us.

I learnt from others’ benchmarking results: time per FFT should around 1 or less us.
Can you help to solve my problem if you know them? Thanks a lot.

My GPU is c2050, which is very powerful one.

Any prof can help with this? Thanks.

I try the same FFT for twice, I see the first FFT used 70us, but the second one only use 16us.

I understand I need to do large number of FFT to see the performance, my case is sometimes I need to do only 1FFT.

I want to see a good performance if I only need to do 1FFT. Anyone can suggest? Thanks.