Running the Apple OpenCL FFT code does not seem to compute an IFFT correctly. I’ve obviously ported the Mac specific calls in main.cpp, and changed the #define for complexMul to an inline function (another known problem.)

Performing a forward FFT works as it should, however specifying an inverse FFT does not provide the correct answer. In fact the result appears to be a forward FFT with the real and imaginary components simply swapped.

Various FFT sizes were tested including our required FFT (256K) and then a 16 point FFT just to verify the problem.

Has anyone else seen this problem? Is this a known problem, and if so, where does the problem lie?

I am using the 3.0b SDK for my testing. (All drivers, SDK, etc was downloaded from the CUDA 3.0b post.)

OS used is Ubuntu 9.04 64 bit.

Thanks,

Dan

Hi dankarner,

Have you figured out this problem? I came across exactly the same situation and I really appreciate if you can enlighten me somehow.

Thanks.

Best Regards

Actually I did. Turns out it wasn’t a problem at all. The data set I picked was effectively a ramp with similar real and imaginary components, which made the FFT of the signal look nearly identical to the IFFT. (With the exception of the real and imaginary parts switched.) The reason the data didn’t look correct, or more importantly didn’t match the Matlab output was that Matlab performs scaling, while cufft does not. Once I multiplied the IFFT result by the IFFT size, the answer was correct.

Of course now the problem is this code is 2-3 times slower than cufft. So now I’m stuck deciding whether I want slow portable code, or fast non portable code. Until Fermi comes out (and/or SDK 3.0 non beta) the projects monetary and thermal budget reluctantly points to non-portable code. We’ll see if Fermi or an updated SDK helps any of that.