FFT algorithm implementation

I am doing a simple 1D FFT using the CUFFT library given with CUDA. I want to run a small size (1k) pt. FFT iteratively for 1 Million data points .i.e 1k times. So is it possible to execute these small FFTs at the same instance and not sequentially ? i.e can I run same instance of “cufftExec” routine for different sample values simultaneously ?

I am using CUDA 2.2 with 8400 GS on CentOS 5


mits

you can use batch mode, please see page 6 in CUFFT_Library_2.3.pdf

cufftResult cufftPlan1d( cufftHandle *plan, int nx, cufftType type, int batch );
creates a 1D FFT plan configuration for a specified signal size and data
type. The batch input parameter tells CUFFT how many 1D transforms to configure.

Thanx for help. External Image

I used the batch option but its giving poor performance compared to the earlier sequential implementation as data transfer time was less in previous case. So is there a way to reduce it ?

could you post your code to decribe how you measure performance?