CUFFT error: 3D batched C2R transforms With simple test code

cudapp · October 13, 2011, 12:14pm

Hi,

I’m having problems trying to execute 3D batched C2R transforms with CUFFT under some circumstances. I have made some simple code to reproduce the problem.

The goal is to compute 2000 transforms of size 14x14x256. I get a CUFFT_EXEC_FAILED error every time cufftExecC2R is executed, but if I change ‘z’ dimension from 256 to 258 everything runs fine.

Test code is as follows. Nothing is executed before.

cufftHandle plan;

int x=14,y=14,z=256;

int fftDims [] = {x, y, z};

cufftComplex* idata;

cufftReal* odata;

int idataEls = x*y*(z/2+1);

int odataEls = x*y*z;

int batch = 2000;

cudaMalloc(&idata,idataEls*sizeof(cufftComplex)*batch);

cudaError cErr = cudaThreadSynchronize();

if(cErr != cudaSuccess){

	std::cout << "Error allocating gpu memory\n";

	exit(-1);

}

cudaMalloc(&odata,odataEls*sizeof(cufftReal)*batch);

cErr = cudaThreadSynchronize();

if(cErr != cudaSuccess){

	std::cout << "Error allocating gpu memory\n";

	exit(-1);

}

std::cout << "Memory used: " << idataEls*sizeof(cufftComplex)*batch+odataEls*sizeof(cufftReal)*batch << " bytes\n";

cudaMemset(idata,2,idataEls*sizeof(cufftComplex)*batch);

cErr = cudaThreadSynchronize();

if(cErr != cudaSuccess){

	std::cout << "Error with cudaMemset\n";

	exit(-1);

}

cufftResult fftError = cufftPlanMany(&plan,3,fftDims, NULL,1,0, NULL,1,0,CUFFT_C2R, batch);

cErr = cudaThreadSynchronize();

if(fftError != CUFFT_SUCCESS || cErr != cudaSuccess){

	std::cout << "Error creating gpu FFT plan\n";

	exit(-1);

}

fftError = cufftSetCompatibilityMode(plan,CUFFT_COMPATIBILITY_FFTW_ALL);

cErr = cudaThreadSynchronize();

if(fftError != CUFFT_SUCCESS || cErr != cudaSuccess){

	std::cout << "Error setting gpu FFT plan compatibility\n";

	exit(-1);

}

fftError = cufftExecC2R(plan,idata,odata);

cErr = cudaThreadSynchronize();

if(fftError != CUFFT_SUCCESS || cErr != cudaSuccess){

	std::cout << "Error executing gpu FFT plan\n";

	exit(-1);

}

I work with a GeForce GTX 470 card (1248 MBytes of RAM) and CUDA 4.0

Regards.

Elif · October 14, 2011, 5:50pm

Can you check if in-place transform works (after disabling allocation for odata):

fftError = cufftExecC2R(plan,idata,idata);

cudapp · October 17, 2011, 9:16am

Checked.

It uses half the memory but the behavior remains the same, not working for z=256 but working for z=258.

pasoleatis · October 17, 2011, 9:30am

What if you try first with batch=1 and no CUFFT_COMPATIBILITY_FFTW_ALL? then change batch to 2000. I have an iterative code in both 2D and 3D and it works without problem, ut I can not see the error in your code.

cudapp · October 17, 2011, 9:56am

I’ve tried disabling FFTW compatibility:

For batch=1 it runs fine for every size, but then it’s not a batched operation but single 3D transform.
For batch=2000 the behavior remains the same, working only for z=258.

I think it’s not related with FFTW compatibility. For batch=1 and FFTW compatibility enabled, it also works fine for every size.

Elif · October 19, 2011, 12:08am

Note that CUFFT Library has some temporary space usage (which is allocated at planning time). And this memory size varies depending on the size of the transform. For the problem size you are trying (14x14x256), the temporary space is almost as large as the input data size; hence altogether filling the 1.2GB memory. However, you mentioned that in-place transform still fails despite the additional GPU memory availability (assuming that one of the cudaMallocs was removed).

You can work around the failure by calling the execute twice, splitting into two batches as follows:
cufftResult fftError = cufftPlanMany(&plan,3,fftDims, NULL,1,0, NULL,1,0,CUFFT_C2R, batch/2);
fftError = cufftExecC2R(plan,idata,odata);
update the idata and odata point to the second half and call again
fftError = cufftExecC2R(plan,idata,odata);

In any case, you should file a bug for this to be tracked by NVIDIA.

pasoleatis · October 19, 2011, 8:39am

for some of my cases my program will not run ven if it appears that everything fits in the gpu memory.

Does it run for some other values of batch?

cudapp · October 19, 2011, 12:01pm

That is right. It’s related with the free amount of memory. I’ve made some experiments with cudaMemGetInfo(…) and in-place transforms, and I got these results:

batch → 2000
Total memory: 1309081600 bytes
idata: 404544000 bytes
Free memory before planning: 775577600 bytes
Free memory after planning: 775577600 bytes
cufftExecC2R fails to execute, but planning instruction returns CUFFT_SUCCESS instead of CUFFT_ALLOC_FAILED.

batch → 1800
Total memory: 1309081600 bytes
idata: 364089600 bytes
Free memory before planning: 816209920 bytes
Free memory after planning: 93609984 bytes
Planning with this batch size returns CUFFT_SUCCESS, and cufftExecC2R executes fine also. Temporary space is almost twice as idata, but it works.

So it seems that cufftPlanMany is returning success while is a fail allocating memory…

I think this solves my problem, but looks like there is some kind of bug with cufftPlanMany or something.

Thank you very much for your support.

pasoleatis · October 19, 2011, 12:55pm

What about 2001 :)? I guess there is a limit to the size of batch.
Free memory before planning: 775577600 bytes
Free memory after planning: 775577600 bytes
nothing happens here for 2000.

cudapp · October 19, 2011, 2:27pm

Right, nothing happens with batch = 2000. cufftPlanMany should return an error because the plan is not allocated, but returns success instead.

with batch 2001 I got this:

idata: 404746272 bytes
Free memory before planning: 737284096 bytes
Free memory after planning: 737284096 bytes

There is not enough space for the planner to allocate the memory it needs (around idata*2 bytes).

Topic		Replies	Views
cufftPlan2d fails CUDA Programming and Performance	14	21012	September 17, 2007
CUFFT problem invalid plan / internal error CUDA Programming and Performance	5	3912	December 21, 2009
What is the real memory usage of cudaFFT CUDA Programming and Performance	5	16745	January 21, 2008
CUFFT issue CUDA Programming and Performance	0	1105	December 29, 2009
Does cufftPlan3d allocate additional memory? Why? CUDA Programming and Performance	1	1087	April 7, 2009
cufft error (?) CUDA Programming and Performance	7	8991	March 5, 2012
CUFFT bug in Cuda 4.0 Release Candidate 2 CUDA Programming and Performance	8	1637	May 5, 2011
Problem with cufftPlan2d CUDA Programming and Performance	1	869	May 9, 2017
Memory needed for CUDA 1D FFT plan creation - or how to make saussage with CUDA hacks CUDA Programming and Performance	8	12255	February 9, 2011
size limit of 1D FFT CUDA Programming and Performance	8	2526	September 24, 2011

CUFFT error: 3D batched C2R transforms With simple test code

Related topics