I am intending to use Tesla as a computing hardware for numerical simulations (MLFMA).
part of the calculation is performing multiple 1D FFT’s at ones. The FFT sizes range from N=2 to N_total=2^14 (powers of 2).
each level the simulation needs to run N_total/N FFT’s.
How will CUFFT handle this operations? will different FFT’s run on the same SIMD or on a few in parallel? how will it handle the small FFT’s that are shorter than the number of cores in a SIMD (32 in the fermi architecture)?
Best,