I am doing a simple 1D FFT using the CUFFT library given with CUDA. I want to run a small size (1k) pt. FFT iteratively for 1 Million data points .i.e 1k times. So is it possible to execute these small FFTs at the same instance and not sequentially ? i.e can I run same instance of “cufftExec” routine for different sample values simultaneously ?
you can use batch mode, please see page 6 in CUFFT_Library_2.3.pdf
cufftResult cufftPlan1d( cufftHandle *plan, int nx, cufftType type, int batch );
creates a 1D FFT plan configuration for a specified signal size and data
type. The batch input parameter tells CUFFT how many 1D transforms to configure.
I used the batch option but its giving poor performance compared to the earlier sequential implementation as data transfer time was less in previous case. So is there a way to reduce it ?