Does anyone know how cufft handles multiple GPUs? Does it split the work up internally to multiple cards? Or is it determined by the cudaSetDevice() call of the thread that issues a cufftExec?
You’ll have to fan it out to multiple GPUs for yourself. I haven’t done this, but I wouldn’t be surprised if the cudaSetDevice() call will do this for you. I haven’t gotten to working on our stuff that needs FFTs yet, but I believe that’ll work, and it should be simple to test.