cufft error (?)

ARom_nsk · September 27, 2011, 3:28am

Hello everybody!

I faced with the following problem:

Here is the code

For dimension Nxm=Nym=Nzm <=511 everything work fine.

For dimension Nxm=Nym=Nzm=512 cufftExecC2C returns CUFFT_EXEC_FAILED.

For dimension Nxm=Nym=Nzm=513 everything work fine again.

CUDA 4.0, Tesla C2050

for Tesla C2070 everything work fine for me

Any ideas?

vectraproject · February 28, 2012, 2:08pm

I do have the same 512512512 error with cuda 4.0, and GTX 560ti and probably GTX 285.

I have the same code and definitely no idea of the origin of the problem…

pasoleatis · February 28, 2012, 3:27pm

The size is too large. Are you sure it really works for Nxm=513? I was able to go up to 400x400x400 doubles only. please check the memory, before the cudamalloc after cudamalloc and after the plan.

L_F · February 28, 2012, 3:50pm

cufft uses different algorithms for data size [512 (power of 2)] and [511 or 513]. Probably the algorithm for [power of 2] needs more memory.

vectraproject · February 28, 2012, 4:51pm

The code I use is maybe different, but I figured C2C was for single-float 32 bits.

My GTX 560Ti and 285 have 2Gb ram, which is apparently enough.
I can make computations on 508^3, and I can allocate a cuFFTcomplex volume in 508^3 plus a single float volume in 508508350, so I expected I could run a single 512^3 FFT.

But, as LF said, if cuFFT requires more memory for the faster 512 computation, I may hit the memory ceiling…
I knew FFT computations are real faster in 2^n, but I did not figure it would be at the expense of significantly more memory.

I brought my code to a C2050 telsa a few minutes ago… It has 3Gb, is used in blind mode (another graphics adaptater was installed), runs on cuda 4.1, gcc 4.5, and my code compiles and runs fine without modification.

Some figures:

GTX 560 ti:
508^3 single float CUFFT_INVERSE: 2.3s
512^3: unknown

TESLA C2050:
508^3: 0.7s
512^3: 0.12s

Tesla rocks…

pasoleatis · February 28, 2012, 7:35pm

Really check how much memory is used and check that the transform is really done.

vectraproject · February 29, 2012, 1:53pm

Hi,

I have intermittent access to the PC with the Tesla C2050: I just managed to run two backward 508^3 and 512^3 single float transforms and check the resulting volumes (reconstructed image of a cell). I confirm the reconstruction is correct and the timings correct. Maybe there are errors compared to fftw single float, but these are not obvious to the eye.

I did not (yet) use the nvidia timer but I ran a series of 200 in-place transforms which confirms exactly the aforementioned computing times. I also noted that the Tesla card was working in “adaptative mode” and not “performance”: power consumption on the plug was about 300W for the whole “big” Xein PC, GPU core temperature was below 65Â°c and the Tesla fan was barely audible.

Please note that timings only includes in-place fourier computation of data placed in GPU memory: transferts and plans are excluded.

Next time I’ll have access to the PC, I’ll check the amount of memory required by cuFFT, and I’ll also check how much memory I can reserve during the transform. I consider having my department buy one for me, so I need to be sure of my requirements.

vectraproject · March 5, 2012, 12:22am

Hi,

I forgot to mention I was using Cuda 4.0 / gcc 4.4 / debian stable.
I switched to Cuda 4.1 / latest dev drivers / gcc 4.5 / debian testing, and now, I can finally compute my fourier transform in 512x512x512 single float.

It takes… 0.3s: real fast!!
In comparison, fftw takes 0.72 seconds in exhausive wisdom mode for 3 cores on a core i7 2600k with fftw 3.3.0.

(edited: was 1.60s with fftw 3.2.2)

Topic		Replies	Views
cuFFT Error on CUDA3.2 Tesla C1060 vs Fermi C2050 CUDA Programming and Performance	5	12006	October 27, 2010
cuFFT 2^15+ issues? GPU-Accelerated Libraries	3	1804	January 2, 2013
CUFFT error: 3D batched C2R transforms With simple test code CUDA Programming and Performance	9	4096	October 19, 2011
cuFFT cufftPlan1d and cufftExecR2C issues GPU-Accelerated Libraries	4	2369	July 13, 2016
CUDA 2.2 and failing CUFFT SDK example CUDA Programming and Performance	3	10569	June 9, 2009
cuFFT library Question on cufftExecC2C() behavior CUDA Programming and Performance	0	1098	January 25, 2011
Estimating FFT Performance CUDA Programming and Performance	9	1524	June 4, 2010
Problem with the cuFFT library . CUDA Programming and Performance	3	871	October 8, 2013
CUFFT problem invalid plan / internal error CUDA Programming and Performance	5	3911	December 21, 2009
Errors and Lockups CUDA Programming and Performance	7	5028	September 19, 2008

cufft error (?)

Related topics