Limit of CUFFT 1D with Batch

I didn’t find any related topic neither on the internet nor in this forum that solved my problem. So I’m asking right away.

I want to do a 1D FFT on 2D data line by line. Assuming that one row has always 2048 complex floating points elements I got these results:

with a batch size of 8192, 16384, 32768 everything is ok. But it fails to execute with a batch size of 2^16 saying good bye with error code 6 (CUFFT_EXEC_FAILED)

This is my simple code snippet:

[codebox] cufftHandle plan;

cufftSafeCall(cufftPlan1d(&plan, width, CUFFT_C2C, height));

cufftSafeCall(cufftExecC2C(plan, (cufftComplex *)idevData, (cufftComplex *)idevData, CUFFT_FORWARD));


I tried to force a smaller batch size so I wrote height/2 -> didn’t work either. That’s why I wrote height/4 and it works.

Does this have something to do with memory available on the gpu?

I do some copy to the GPU so I have following memory usage (in MB).

[codebox] |available before and after copy |used

8192 |1147.11 | 1012.73 |134.38

16384 |1147.11 | 878.48 |268.63

32768 |1124.63 | 587.50 |537.13

65536 |1412.41 | 338.28 |1074.14[/codebox]

Theres is only little memory left so that could be the problem.

Interestingly there seems to be no memory usage for the plan. I output the usage right before creating the plan and right after but there is no difference in these number.

To solve this problem, I want to find out how to identify a fitting batch size that can perform a 1D FFT with only little memory left. Is this even possible or should I just say 2^15 as height is maximum and everything that’s bigger should be done in a loop? But what to do if the hardware changes and I have more/less memory available?


I think, I found a solution for this problem.

I output some more information about memory use and found out, that after each cufftExecC2C it’s using more memory. It’s a bit surprising because I’m running an in-place cufft but for some reasons it’s using twice as much.

Here you can see my values I checked

[codebox] available cufftExec

before		after 		using

8192 1265,119232 1130,868736 134,250496

16384 1130,868736 862,43328 268,435456

32768 862,367744 325,496832 536,870912[/codebox]

Hope this will help someone else.

Best regards