at the moment Iâ€™m writing a CUDA implementation which requires a FFT-transform (with the CUFFT-library) and Iâ€™m trying to coalesce memory access on my Fermi-GPU.
To perform a CUFFT transformation one have to save the data in a variable of type â€œcufftDoubleComplexâ€. But the Data is saved in an â€œarray of structs (AoS)â€. [for example: myVariable[N].x]
According to the webinaire: â€œGlobal Memory Usage and Strategyâ€œ Justin Luitjens (07/12/2011) one should avoid AoS and better use a â€œstructure o arrays (SoA)â€. [for example: myVariable.x[N]]
Likely the CUFFT functions can deal with it (?), but Iâ€™d like to use the cufftDoubleComplex-variable to perform other operations with it.
The question is, when I use a cufftDoubleComplex-variable and perform normal operations with it (like parallelized multiplications) is the memory coalesce, and when/ or when not, why?
Many thanks for answers!!
Greetigs from Germany!
I am not sure I understand your point. The cufft library uses complex numbers in which the real and imaginary part are one near each other, so if you access the real part of complex number the imaginary part can be loaded very fast, so you should always perform operations with both parts in the same time. If you would have an array of complex numbers its equivalent in memory would be an array of real numbers in which the odd elements would be the real parts and the even components would be the imaginary part. If the matrix on fft is done is large it should not have an impact on performance since the fft are quite heavy calculations. Maybe using texture memory can make some difference.
I am running a code which solve in k space a time dependent partial differential equation. The program makes many iterations in which there are some fft and some simple multiplications, if you find something that would speed up your code please post it here, I am interested as well.