I’m very new to CUDA and GPU programming, so excuse my very naive outlook on this very impressive (sometimes complicated) programming language. I’m trying to convert an existing code of mine to CUDA, but I’ve come across a big problem which seems to be going against the whole point of parallel programming!
I have a function which is declared as device. I want the function to do an inverse FFT using CuFFT, but when I compile, it gives the error that a host function (cufftExecR2C) cannot be called from a device fucntion. The thing is, my code would be absolutely ugly if I take cufftExecR2C out to a host function (I’m not even sure how to do it), because it will be making 1000s of parallel compuation, then memory transfer to host, then FFT, then memory transfer back to device each time. This seems pointless unless I can do the FFTs on device, using CuFFT functions called from device functions.
Just to supplement, what I want to do is
- calculate a massive Jacobian matrix on GPU
- each element of the Jacobian requires several FFTs
- I want the FFTs to be done (called from) on device, with no memory transfer back to host.
Many thanks for your replies.