My program run on Quadro FX 5600 that have 1.5Gb Graphic memory, in that i need to perform 3D fft over the 3 float channels. The program ran fine with 128^3 input. However , it got error with 256^3 inputs, i think due to the lack of memory.
The program need 3 real channel inputs each have 256^3 size. I use R2C to convert data to Fourier domain, process and covert back to spatial domain by C2R FFT. It got error while it performed cuda fft. So my question are:
What are maybe the reason that cause cudafft failed. It run fine with 128^3 input, and i test fft with 256^3 input in a separate program and that works
what is the real memory usage for these 3 channel ffts . How can i calculate / measure this amount of memory
"The heuristics in CUFFT are somewhat complicated, so it’s hard to predict how much temporary storage the library will use.
There are cases where it uses none, and there are cases when it can use up to 3x the size of the transform. It depends on the transform size and the particular FFT algorithm needed for that size (and that maps best to the HW). Even an in-place FFT might use some temporary storage depending on the signal size."
To be sure, you could use cuMemGetInfo() to get the amount of free memory before and after the CUFFT calls.
I posted my reply to Linh Ha, before reading your post. The comment about the heuristics is somewhat unsatisfactory. I would like to use CUFFT in production code, where I calculate nx,ny,nz based on other data and having things fail unpredictably is not an option.
Does this make any sense to the authors of CUFTT?:
NX NY NZ C2errror
256 256 32 no Avg and max rel error = 2.06e-02 3.92e+01
256 256 32 yes Avg and max rel error = 2.10e-07 1.19e-06
256 256 31 no Avg and max rel error = 9.18e-07 8.70e-06
256 256 33 no Avg and max rel error = 3.20e-07 1.98e-06
These results do not make any sense. The error is | V - inverse_fft(fft(V))
with the inverse fft result scaled by 1./(nxnynz).
C2C (yes/no) means complex-complex-copmplex/real-complex-real
Why should nz=32 fail for r-c-r transforms, but be ok for c-c-c transforms,
why should r-c-r then not fail for oddballs like nz=31 or nz=33???
the memory amount needed for nz=32 r-c-r is 16 MB, even with up to 3x
workspace there is plenty of room left on my lowly 256 MB card and