CUFFT and memory coalescing

Hallo Everyone,

at the moment I’m writing a CUDA implementation which requires a FFT-transform (with the CUFFT-library) and I’m trying to coalesce memory access on my Fermi-GPU.

To perform a CUFFT transformation one have to save the data in a variable of type “cufftDoubleComplex”. But the Data is saved in an “array of structs (AoS)”. [for example: myVariable[N].x]

According to the webinaire: “Global Memory Usage and Strategy“ Justin Luitjens (07/12/2011) one should avoid AoS and better use a “structure o arrays (SoA)”. [for example: myVariable.x[N]]
Likely the CUFFT functions can deal with it (?), but I’d like to use the cufftDoubleComplex-variable to perform other operations with it.

The question is, when I use a cufftDoubleComplex-variable and perform normal operations with it (like parallelized multiplications) is the memory coalesce, and when/ or when not, why?

Many thanks for answers!!

Greetigs from Germany!



I am not sure I understand your point. The cufft library uses complex numbers in which the real and imaginary part are one near each other, so if you access the real part of complex number the imaginary part can be loaded very fast, so you should always perform operations with both parts in the same time. If you would have an array of complex numbers its equivalent in memory would be an array of real numbers in which the odd elements would be the real parts and the even components would be the imaginary part. If the matrix on fft is done is large it should not have an impact on performance since the fft are quite heavy calculations. Maybe using texture memory can make some difference.

I am running a code which solve in k space a time dependent partial differential equation. The program makes many iterations in which there are some fft and some simple multiplications, if you find something that would speed up your code please post it here, I am interested as well.