CUFFT memory usage

I wonder about the memory usage by CUFFT. I try to do a 4D FFT on a dataset of size 512 x 512 x 16 x 16. Since there is no direct support for 4D FFT’s in CUFFT I run a batch of 1D FFT’s four times and change the order of the data between them. To change the order I use a temp variable of the same size. To store 512 x 512 x 16 x 16 complex valued floats requires about 536 MB of memory, so in total I would use 1072 MB of memory with the temp variable, my GPU has 1536 MB of memory so this should be fine (I don’t have anything else in the GPU memory). When I try to run the code I however get the error message “out of memory”. If I go down to 512 x 512 x 16 x 10 instead it works. I’ve tried to reboot the machine and to turn it off to make sure that the memory is cleared but it does not help.

Do the CUFFT functions allocate additional memory for the 1D FFT’s? Since I’ve allocated memory for the temp variable, I do all the 1D FFT’s out of place, so there should not be any need for more memory.

My GPU is Nvidia GTX 480 with 1536 MB of memory, I use CUDA 3.0.

You may get your answers if you find availabe memory using cudaGetMemInfo : after each API call…