Request for enhancement - cuFFT for more than 3 dimensions

I need to do 4-dimensional FFTs, and I am currently using the cuFFTW wrapper to do that, since FFTW supports arbitrary dimensions. However according to documentation, cuFFT only supports 3 dimensions. I would like to move to using cuFFT for possible performance gain (this is currently a bottleneck on my program), and so I would ask to add that missing functionality - arbitrary dimensions in cuFFT.

Alternatively, does anyone know if there is currently a way to “hack” together a 4D transform that would have good performance?

requests for enhancement should be filed as bugs using the information provided in a sticky post at the top of the CUDA programming forum.

If the 3 dimensions that you are currently exposing have enough dimensionality (i.e. large enough dimension sizes) its unlikely that going to 4D (vs. issuing a number of 3D transforms) would yield any significant performance benefit.

OK thanks, I’ll do that.

I’m not issuing a number of 3D transforms, I’m using the 4D cuFFTW (wrapper to FFTW) to do it in one go, but I suspect that this is slower than cuFFT. I can try what you said instead.

The cufftw “wrapper” is not a wrapper for FFTW (i.e. it does not wrap FFTW libraries) it wraps CUFFT with an FFTW-style interface.

cufftw does not support the 4D transforms possible with fftw:

[url]https://docs.nvidia.com/cuda/cufft/index.html#fftw-supported-interface[/url]

You can do multiple 3D transforms with that interface/wrapper.

fftw_plan_dft() with 4D does work in cuFFTW, I’m pretty sure. However, I’m concerned about possible performance loss from using cuFFTW rather than cuFFT. Do you have any idea if there is any significant performance hit from using cuFFTW? If not, I’m probably happy to keep using fftw_plan_dft(). Otherwise, I could try many 3D transforms with cuFFT, without using the interface.

Yes, I believe I was reading that compatibility table incorrectly. Sorry for the confusion.

I’m not aware of any significant performance issues on cuFFTW vs. CUFFT for 1D,2D, or 3D transforms. I’m not sure how the n-dimensional transforms are implemented under the hood. Profiling the code might be instructive, and/or developing comparative benchmarks. I’m not aware of any published for 4D transforms.