I wanted to use JIT through nvrtc and nvJitLink in my multi-GPU application which computes FFT using extended cuFFT API. JIT requires using CUDA Driver API, however the cuFFT documentation says that multi-GPU transforms are not compatible with ani application that uses CUDA Driver API.
I would like to know, why is that? And is there any other walkaround?
I still do not understand why multi-GPU cuFFT plans should be incompatible with CUDA Driver API, however I was able create application using NVRTC, cuLibrary API and CUDA Driver API and it worked fine. But I must note that I worked only with primary contexts.
Thanks for pointing this out. We will update the documentation - it is outdated.
As long as you use primary context on each GPU then cuFFT multi-GPU API will use that and will work just fine. If you’d use your own contexts then cuFFT will still stick to primary ones and will switch contexts when executing (performance overhead).
For handling multi-GPU plans from one CPU process library needs to ensure that we do not switch the current GPU (for CUDA) or doesn’t affect context stack.
Have you heard about or tried cuFFTMp? (NVIDIA cuFFTMp documentation — cuFFTMp 11.0.5 documentation). API is similar/extension to single-node multi-GPU API but allows to do multi-node calculations.
thank you for your reply. That’s great news for me.
I did and I tried it out and it seemed like it is even faster than multi-GPU cufft. However using MPI adds another level of complexity and brings new problems to the program. Also cufftMp brings more restriction to the supported hardware.
Once again thank you for your reply!