Hi All!

We got a GTX260 in my workstation to hopefully do some parallel numerical integrations and after researching, I have a slew of questions.

The three kinds of integrators I hope to test run as such.

Pseudo-Spectral: Each vector is 2^8 long. Maybe 2^9 depending on solution stability.

Foreach timestep

FFT vector

math and iFFT

FFT again

math and iFFT

FFT again

Crank-Nicolson:

Foreach timestep

Shift and add some vectors

Inverse a matrix (in a clever way)

Vanilla RK4:

Foreach timestep

Shift and add some vectors

multiply some stuff

Shift and add some vectors

So, I basically need to run these integrators 2000 (or many more) times across different parameters and give back results I can then analyze in Matlab. CUFFT is really good at doing LONG vectors many many times, but I have a relatively short vector (256) and I need to fiddle with stuff inbetween each FFT.

I had planned on running each integration as its own thread on the GPU, but the CUFFT wants to do the FFT across the whole chip. I don’t think doing CUFFT for a 256 long vector will give me much improvement over just running it on the CPU. However if I can run all these integrators as single threads, I imagine I can get a (constant*multiple of GPU cores) speedup.

Am I going to have to code a simple single-threaded FFT implementation? Is it possible to use an FFT implementation from the GNU Scientific Library (GSL) as a linked in library that gets run on the GPU?

thanks a bunch! I’m super CUDA newbie, so any help is awesome.

Max