CUDA FFT example

I need information regarding the FFT algorithm implemented in the CUDA SDK (FFT2D). I know the theory behind Fourier Transforms and DFT, but I can’t figure out what’s the purpose of the code (I do not need to modify it, I just need to understand it). Seems like data is padded to reach a 512-multiple (Cooley-Tuckey should be faster with that), but all the SpPreprocess and Modulate/Normalize things are just confusing me. No papers are attached to the SDK example, so can you please point me out on how to understand the algorithm thoroughly?

There is a paper:

It may help, it’s still kinda confusing though.

I already saw the document but it doesn’t explain all the code, the problem is that after the FFT is performed a weird signal pre and post processing is executed simultaneously to the points multiplication. Why is it necessary? It uses weird twiddle factors too (like spreaded across the x values)

I simply cannot understand the code and I suppose a LOT of people can not too (I asked on many forums but they could not answer)

Well, I’m interested too. And I don’t really understand the point 2) of the 5th slide of the pdf. Do we fold the border pixel over the other border ?

To use it for images, we have to linearize it ? because the random data they use in the sample is 1D