multipe FFT on tesla at ones

I am intending to use Tesla as a computing hardware for numerical simulations (MLFMA).
part of the calculation is performing multiple 1D FFT’s at ones. The FFT sizes range from N=2 to N_total=2^14 (powers of 2).
each level the simulation needs to run N_total/N FFT’s.
How will CUFFT handle this operations? will different FFT’s run on the same SIMD or on a few in parallel? how will it handle the small FFT’s that are shorter than the number of cores in a SIMD (32 in the fermi architecture)?

Best,

CUFFT supports batching (so you can execute a batch of uniformly-sized FFT’s at once), but I’m pretty sure it doesn’t support non-uniform batch sizes. In any case, if the FFT you’re doing is smaller than the number of “SIMD cores” (i.e. warp size), you might be better off doing them on the CPU (unless you need to do an absolutely massive number of them…then you’ll probably have to write your own hand-tuned code for it).