Big arrays with cuda?

Hi. I have a problem. I try to make FFT on Cuda. It seems very promising. But I can’t do FFT on big arrays. For me 3D fft limit was 128x128x64 complex points. How to make FFT on Cuda with very big arrays? like 512x512x512? Is it possible? Does anyone know how?
Thank you.

I’m gonna raise this issue to the top of the queue! Hopefully someone will answer :rolleyes:

I’m a complete noob to GPU programming and I’m mostly programming in python at the moment. I managed to get Bogdan Opanchuk’s pyfft demo at

up and running on my Dell latitude D830 laptop. I can increase the size of a 2D FFT to 2048 x 2048 before the GPU card (NVIDIA Quadro 140M ? if I remember correctly) runs out of memory. However what should I do if I want to FFT bigger arrays (4096 x 4096 and counting). There doesn’t seem to be any advice on the Net about this situation, although it must be fairly common. I know that I can probably shuffle the data from CPU memory to GPU memory 1 row/column at a time and do the FFT by FFTing all rows, followed by all columns but I would think that all this memory management would slow things down and I’m just better off using the numpy built-in FFT routines, which sit on top of FFTPACK if I remember correctly. I’d appreciate any advice that people might have.

From a quick check, your Dell Latitude has a Quadro NVS 140M with 256Mb of memory.

4096 x 4096 x 8 bytes = 128 Mbytes, so it’s not surprising you can’t run FFTs that big (given that there’s some temporary storage required).

I’m not at FFT expert but presumably it is possible to decompose big FFTs into several smaller ones. Since the FFT is pretty compute-intensive you should still be able to get a good speedup even with the extra transfers (especially if you can overlap them with the computaiton). Sounds like a nice project to layer on top of CUFFT?