I didn’t find any related topic neither on the internet nor in this forum that solved my problem. So I’m asking right away.
I want to do a 1D FFT on 2D data line by line. Assuming that one row has always 2048 complex floating points elements I got these results:
with a batch size of 8192, 16384, 32768 everything is ok. But it fails to execute with a batch size of 2^16 saying good bye with error code 6 (CUFFT_EXEC_FAILED)
This is my simple code snippet:
[codebox] cufftHandle plan;
cufftSafeCall(cufftPlan1d(&plan, width, CUFFT_C2C, height)); cufftSafeCall(cufftExecC2C(plan, (cufftComplex *)idevData, (cufftComplex *)idevData, CUFFT_FORWARD)); ...[/codebox]
I tried to force a smaller batch size so I wrote height/2 -> didn’t work either. That’s why I wrote height/4 and it works.
Does this have something to do with memory available on the gpu?
I do some copy to the GPU so I have following memory usage (in MB).
[codebox] |available before and after copy |used
8192 |1147.11 | 1012.73 |134.38
16384 |1147.11 | 878.48 |268.63
32768 |1124.63 | 587.50 |537.13
65536 |1412.41 | 338.28 |1074.14[/codebox]
Theres is only little memory left so that could be the problem.
Interestingly there seems to be no memory usage for the plan. I output the usage right before creating the plan and right after but there is no difference in these number.
To solve this problem, I want to find out how to identify a fitting batch size that can perform a 1D FFT with only little memory left. Is this even possible or should I just say 2^15 as height is maximum and everything that’s bigger should be done in a loop? But what to do if the hardware changes and I have more/less memory available?
I think, I found a solution for this problem.
I output some more information about memory use and found out, that after each cufftExecC2C it’s using more memory. It’s a bit surprising because I’m running an in-place cufft but for some reasons it’s using twice as much.
Here you can see my values I checked
[codebox] available cufftExec
before after using
8192 1265,119232 1130,868736 134,250496
16384 1130,868736 862,43328 268,435456
32768 862,367744 325,496832 536,870912[/codebox]
Hope this will help someone else.