I am using a GeForce 8400Gs on windows for displaying and also have a Tesla C870 installed.
When I use cuFFT functions on which of the 2 GPUs it is calculated and how can I make sure that the Tesla GPU is used?

Another question is if there is a comparison of cuFFT and FFTW which shows when cuFFT with data transfer included performs faster than the CPU FFTW version?

Use cudaGetDeviceCount and cudaGetDeviceProperties to find the Telsa device, then use cudaSetDevice.

I have some numbers for a batched 2D FFT here, but they don’t include data transfer. It wouldn’t be too hard to code up your own test. Don’t expect the GPU to outperform FFTW for small FFTs in CPU memory. If it’s small job that’s already on the CPU, you may be better off using FFTW.