When using cufft with callback with store operation result buffer size must be equal to FFT plan size

yehonatans68sw4 · June 9, 2024, 7:12pm

Hi,

I am using FFT with a store callback operation in batch mode. My requirement is for the store callback to crop the result and save only the cropped data, making the output buffer 1/8th the size of the FFT plan.

However, when I run the FFT with my callback, I encounter error number 6, despite not accessing the output buffer. I suspect the output buffer might be used as a work buffer and not only as an output buffer, requiring it to be the size of the FFT plan multiplied by the batch size.

Could you please confirm if this is correct and advise on how I can resolve this issue?

Please review a toy with my use case
brokenCallbackCuFFT.txt (11.0 KB)

yehonatans68sw4 · June 10, 2024, 5:08am

Another try was to use cufftXtMakePlanMany (cufftHandle plan, int rank, long long int *n, long long int *inembed, long long int istride, long long int idist, cudaDataType inputtype, long long int *onembed, long long int ostride, long long int odist, cudaDataType outputtype, long long int batch, size_t *workSize, cudaDataType executiontype);
while onembed[0:2] dimensions aresmaller than n dimensions but it still did not work.

In addition I tried to configure another plan with inembed[0:2] while the dimensions are smaller than n dimensions There was call to the full fft plan and not only to the sub plan…

I must to say that there was not any cuda error when I use cufftXtMakePlanMany with those attributes

yehonatans68sw4 · June 12, 2024, 1:06pm

Hi there are any updates?

dejvbayer · June 13, 2024, 7:30pm

Hi, I think you are right that the output array is used for temporary results. You cannot make it smaller than n[0] * n[1] * n[2] * batch elements, even the documentation says (link: cuFFT):

Note that the size of each dimension of the transform should be less than or equal to the inembed and onembed values for the corresponding dimension, that is n[i] ≤ inembed[i] , n[i] ≤ onembed[i] , where 𝑖∈{0,…,𝑟𝑎𝑛𝑘−1}.

I tried to do something similar myself without success.

David

Curefab · June 13, 2024, 7:43pm

As a general hint: If you want to crop the FFT output by a lot, also consider a cropped DFT solution (matrix multiplication) as an alternative. It better lends to cropping, but has higher computational complexity (with simpler math).

yehonatans68sw4 · June 13, 2024, 7:57pm

Hi,

Thanks for your response about the work buffer. Unfortunately, matrix multiplication is too costly a solution.

Are there any restrictions on using my own callback to read input data and perform zero padding before the FFT transformation?
This would eliminate the need for zero padding in the global memory.
Additionally, I would like to confirm that the input data for the FFT transformation is read-only (when FFT is not in-place) and will not be affected by the FFT process.

dejvbayer · June 13, 2024, 8:11pm

Hi, the documentation says that all out-of-place transformations except for C2R preserve the input data. So, I think there should be no restrictions on the load callbacks. However I am pretty sure that the output buffer must be at least the size of the full fft.

David

system · June 27, 2024, 8:12pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cufft store callback broken GPU-Accelerated Libraries cufft	2	694	August 5, 2022
Large data size for cuFFT GPU-Accelerated Libraries	8	3826	September 8, 2018
cufftGetSize1d fails with a CUFFT_ALLOC_FAILED error GPU-Accelerated Libraries cufft	5	643	April 12, 2023
What is recommended for using cufft callbacks? GPU-Accelerated Libraries cufft	0	429	December 18, 2023
why cufftPlan needs such many GPU mem? CUDA Programming and Performance	1	5969	January 10, 2011
Arbitrary sizes in cuFFT GPU-Accelerated Libraries	0	280	May 26, 2020
Problem with cufftPlan2d CUDA Programming and Performance	1	869	May 9, 2017
plz help me! cufft error 'code=6(CUFFT_EXEC_FAILED)' GPU-Accelerated Libraries	1	1706	September 26, 2015
Internal details/limitations of cuFFT, general questions GPU-Accelerated Libraries	2	591	July 19, 2018
[SOLVED] cuFFT not liking a given length (error 2), but will accept larger work GPU-Accelerated Libraries	5	816	July 2, 2019

When using cufft with callback with store operation result buffer size must be equal to FFT plan size

Related topics