I have a 3D data set of sinogram data (a series medical image projections viewed along a different dimension). Say my projection data is [x_dim y_dim z_dim] with sinogram data in the x_dim dimension then I need to perform y_dim * z_dim 1D ffts with N = x_dim on the 3D data set. The cufft library appears not to support this type of functionality with the exception of streaming and using cufftPlan1d().

After reading the Cuda FFT documentation and reviewing fftw’s documentation. It appears that for a 3d volume cufftPlan3d() and cufftPlanMany() performs a separable transform on each and every one of the dimensions. So for my purposes this cannot be used.

It also seems that none of these functions use cudaMemcpy[N]D where N is 2 or 3 i.e. 2D or 3D. cufftPlanMany performs separable transforms on N dimensions. cufftPlanMany() and cufftPlan3D perform seperable transforms along every dimension. This appears to necessitate having to perform 1D transforms over a 3D or 2D data set then rearrange the data into 2D and 3D pitched memory if further processing is needed on the memory where pitched memory (or stride) access would be desirable.

The requirement for padding, without disabling using cufftSetCompatibilityMode(), necessitates having to allocate memory, copy the data on padded boundaries and/or rearrange in GPU memory. This seems inefficient for large data sets. I.E. rearranging the data after copy or copying on boundaries. What is the best way to handle this? (1) On copy or (2) rearrange/realign in GPU?

Is my assessment of the CUDA FFT library correct?

if so:

Can 2D, 3D, Nd transforms be provided on 2D an 3D data sets which use pitched memory where the dimensions upon which to perform the transforms can be specified provided in future revisions of the fft lib?

It appears I need to do y * z linear non pitched/stride copies, create y * z streams perform fft, filter, and inverse fft, then put into pitched memory and continue on.

This library seems like square peg in round hole where pitched memory is concerned or can I use pitched memory?