Question about cuFFT library

jfdeg256 · August 13, 2021, 9:11am

Hi,

for years i’ve been using cuFFT to speed-up my signal processing application, and as I always did multiple contiguous 1D FFTs, cufftPlan1D totally fulfilled my needs.

Now I need to do something a bit more tricky. My data are stored in a 3D matrix of size 512x512x16, and I need to perfrom :

512x16 contiguous FFTs of size 512 in the first dimension => I can use cufftPlan1d like I always did for that
tricky part : 512x16x32 non-contiguous (stride 32) FFTs of size 16 along the second dimension but stored in a contiguous way (see below).

In the cuFFT documentation, there’s cufftPlanMany which some of the input parameters are far from being crystal clear for me.

So my questions are :

Can I perform these 2 operations using a single 2D-FFT plan ? (i doubt about it)

At least, can I use cufftPlan1d then cufftPlanMany to do this ? If yes, I suppose I won’t be able to batch all my FFTs accross dimension 2 because the batch would be on dimension 1 and dimension 3. Should I batch on dimension 1 and use streams for the loop on dimension 3 ?

Thanks.

Jeff

mnicely · August 13, 2021, 5:36pm

In the cuFFT documentation, there’s cufftPlanMany which some of the input parameters are far from being crystal clear for me.

Inputs are consistent with FFTW Advanced Complex DFTs (FFTW 3.3.10)

Can I perform these 2 operations using a single 2D-FFT plan ? (i doubt about it)

No

IIUC, it looks like your sizes are consistent; therefore you should be able to create cufftPlanMany and launch chunks in multiple streams to allow as much overlap as possible and saturate the GPU.

In the future, it’s much easier to help if you can provide a small reproducer.

jfdeg256 · August 16, 2021, 6:53am

Thanks, I’ll share my code in case others might be interested.

Topic		Replies	Views
cufftPlanMany R2C advanced layout problem CUDA Programming and Performance	3	2735	March 17, 2012
Internal details/limitations of cuFFT, general questions GPU-Accelerated Libraries	2	593	July 19, 2018
cufft Batch Mode Overhead Question ? FFT 1D c2c plan overhead of 512KB per FFT ? CUDA Programming and Performance	2	6240	June 26, 2009
Parralell FFT? CUDA Programming and Performance	5	2345	May 27, 2009
Just to be sure, the only way to split batched 1D R2C/C2R forward/inverse cufft ftt execution between 2 GPUs is via 'cufftXt' library? GPU-Accelerated Libraries	3	787	April 14, 2017
problem running two cufft plans on the same memory CUDA Programming and Performance	1	981	March 21, 2010
Using half precision with CUFFT GPU-Accelerated Libraries cufft	1	154	October 8, 2024
CUFFT on multiple GPUs CUDA Programming and Performance	6	6254	February 15, 2012
Batched 1D FFT not faster than a loop for big images (1024x1024) GPU-Accelerated Libraries cuda	0	481	September 25, 2020
CUDA FFT lib and 1D transforms over 3D volume CUDA Programming and Performance	8	13657	December 13, 2010

Question about cuFFT library

Related topics