Hi All!
We got a GTX260 in my workstation to hopefully do some parallel numerical integrations and after researching, I have a slew of questions.
The three kinds of integrators I hope to test run as such.
Pseudo-Spectral: Each vector is 2^8 long. Maybe 2^9 depending on solution stability.
Foreach timestep
[*] FFT vector
[*] math and iFFT
[*] FFT again
[*] math and iFFT
[*] FFT again
Crank-Nicolson:
Foreach timestep
[*] Shift and add some vectors
[*] Inverse a matrix (in a clever way)
Vanilla RK4:
Foreach timestep
[*] Shift and add some vectors
[*] multiply some stuff
[*] Shift and add some vectors
So, I basically need to run these integrators 2000 (or many more) times across different parameters and give back results I can then analyze in Matlab. CUFFT is really good at doing LONG vectors many many times, but I have a relatively short vector (256) and I need to fiddle with stuff inbetween each FFT.
I had planned on running each integration as its own thread on the GPU, but the CUFFT wants to do the FFT across the whole chip. I don’t think doing CUFFT for a 256 long vector will give me much improvement over just running it on the CPU. However if I can run all these integrators as single threads, I imagine I can get a (constant*multiple of GPU cores) speedup.
Am I going to have to code a simple single-threaded FFT implementation? Is it possible to use an FFT implementation from the GNU Scientific Library (GSL) as a linked in library that gets run on the GPU?
thanks a bunch! I’m super CUDA newbie, so any help is awesome.
Max