CUFFT array size

Hi all,

I have to do multiple simulations simultaneously. For the simulations i need to use 1D IFFT to transform the data back from the frequency domain to the time domain.

for number of simultaneously simulations up to 2800 works the algorithm perfect. Each simulation contains 4000 datapoints. This means, the size of 1D array for the cufft contains 2800*4000 elements.

if i give a number of simulations for example of 2801, i get the following error:

CUFFT ERROR: Unable to execute plan

her is the code for cufft

#define NY 4000
#define NX 1
#define NRANK 2
int n[NRANK] = {NY, NX};
if (cudaGetLastError() != cudaSuccess)
{
	fprintf(stderr, "Cuda error: Failed to allocate\n");
}

if (cufftPlanMany(&planBackw, NRANK, n,
				  NULL,1, 0,
				  NULL, 1, 0,
				  CUFFT_C2R,2800) != CUFFT_SUCCESS)
{
	fprintf(stderr, "CUFFT Error: Unable to create plan\n");

}

if (cufftSetCompatibilityMode(planBackw, CUFFT_COMPATIBILITY_NATIVE)!= CUFFT_SUCCESS)
{
	fprintf(stderr, "CUFFT Error: Unable to set compatibility mode to native\n");
	
}

if (cufftExecC2R(planBackw, d_result, d_erg) != CUFFT_SUCCESS)
{
	fprintf(stderr, "CUFFT Error: Unable to execute plan\n");
	
}

if (cudaThreadSynchronize() != cudaSuccess)
{
  	fprintf(stderr, "Cuda error: Failed to synchronize\n");

}	

cufftDestroy(planBackw);

means that the size limit of 1D array is reached ? What is the max size of the 1D array for cufft? i think, it is 8000000 but with 2800*4000 i have more than 8000000 and the algorithm works fine.

if i reached with 2801 simulations the max size of 1D array: it´s possible to make the 1D Ifft of the data as a 2D Array? thank you in advance for the help

When you make the plans there are also some matrix allocations which we do not see. Assuming that there is enough memory you could divide your data in 2 more equal parts and store in using array of pointers by defining an array of pointer such cufftReal *data[n]; and then call it sequentially.

cufftexecute(plan,data[1]);
cufftexecute(plan,data[2]);

If the memory is not an issue you can call them in arallel using streams, but I think that in this case you need separate plan for each stream.

By the way the limit 1D transform is 128 million elements, so maybe for 2401 it needs too much memory. Is 2401 a prime number?