CUFFT (and kernel) questions

Hi all, I’m new to CUDA programming.
As first exercise I’m trying to port some code (not mine…) which has a fft convolution part. In principle it should be easy, as it is written using fftw3. AFAIK fftw3 <-> cufft implementations are quite similar (from the interface point of view).
I have only a quiestion on cufftPlan1d functions which gets a “BATCH” number as last argument… What is exactly this number?
I have another question regarding the kernel invocation… how can I choose N, T in <<<N, T>>> in an optimal way? Or… how can I set them to be chosen at runtime so that different GPUs (with different capabilities) have optimal settings?
Sorry if these questions look quite trivial to you ^__^.

d

Performing many transforms at once allows for greater speed, esp. for small FFT sizes.
The batch argument is a count of how many transforms are to be done. The input&output data are packed one buffer after another.

My only advice for choosing the right number of threads for a particular FFT size is to benchmark it on the target platform and choose it with your application in mind. Batching up large numbers of FFTs may decrease the amortized time per FFT, but it introduces latency and uses more memory. Only you can decide this.