3D FFT problem

Hello,

I’m working with using Cuda to compute 3D FFT’s for use in python. I think I am getting a real result, but it seems to be wrong.

When I run the FFT through Numpy and Scipy of the matrix
[[[ 2. 2.]
[ 2. 2.]]

[[ 2. 2.]
[ 2. 2.]]] float32,

I get
[[[ 16.+0.j 0.+0.j]
[ 0.+0.j 0.+0.j]]

[[ 0.+0.j 0.+0.j]
[ 0.+0.j 0.+0.j]]] complex128.

However, when I run it with Cuda, I get
[[[ 12.+0.j 0.+0.j]
[ 4.+0.j 0.+0.j]]

[[ 4.+0.j 0.+0.j]
[ -4.+0.j 0.+0.j]]] complex64.

The input in Cuda gets converted in to Complex and I do a C2C transformation.

I’m not really sure why it has different results. It seems to run through fine. I’m fairly sure it gets the right input into the GPU, just not sure why it outputs differently.

When I run it with a similar matrix with all ones instead of twos, the resulting matrices simlar with all the elements halved.

I have attached my test code if it helps.
I’m running SDK 1.1 with newest drivers on linux with an 8800 GTS 320MB if that helps.

Thanks
cudatest.zip (1.45 KB)

Alright, I figured out why it’s giving a different answer. Instead of being sent all ones (or all twos), it’s being sent all ones (or twos) except for zeros in the last row.

Like this:
array([[[ 1.+0.j, 1.+0.j],
[ 1.+0.j, 1.+0.j]],

   [[ 1.+0.j,  1.+0.j],
    [ 0.+0.j,  0.+0.j]]], dtype=complex64)

Now, the array before i send it to the device has the right values, but right after I send it to the Device, it loses a few values. Not sure why.

Any insight to this? Is it a problem with my code?

Thanks.

Ok, a little more insight into the problem. I really hope someone can help.

So, it turns out that when I send my array to the device, it really does keep its values, it was just when I tried to return the data to Numpy, it didn’'t understand something.

I fixed that, but now, it seems as if the cufftExecC2C is broken again. Assuming the data stays consistent in the device memory (when i copy my array to the device and back to the host it seems fine) I get results as if there was a different array in there.

Input array:

array([[[ 1.+0.j, 1.+0.j],
[ 1.+0.j, 1.+0.j]],

   [[ 1.+0.j,  1.+0.j],
    [ 1.+0.j,  1.+0.j]]], dtype=complex64)

output array after doing cufft_FORWARD on input array:

array([[[ 6.+0.j, 0.+0.j],
[ 2.+0.j, 0.+0.j]],

   [[ 2.+0.j,  0.+0.j],
    [ 2.+0.j,  0.+0.j]]], dtype=complex64)

output array after doing cufft_INVERSE on input array:

array([[[ 6.+0.j, 0.+0.j],
[ 2.+0.j, 0.+0.j]],

   [[ 2.+0.j,  0.+0.j],
    [ 2.+0.j,  0.+0.j]]], dtype=complex64)

what proper output array should be for FORWARD on input:

array([[[ 8.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j]],

   [[ 0.+0.j,  0.+0.j],
    [ 0.+0.j,  0.+0.j]]])

what proper output array should be for INVERSE on input:

array([[[ 1.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j]],

   [[ 0.+0.j,  0.+0.j],
    [ 0.+0.j,  0.+0.j]]])

Futher, when I do a working inverse fft on the result i get from my cuda fft, i get:

array([[[ 1.5+0.j, 1.5+0.j],
[ 0.5+0.j, 0.5+0.j]],

   [[ 0.5+0.j,  0.5+0.j],
    [ 0.5+0.j,  0.5+0.j]]])

Now, besides giving me the wrong result for both, cufft_FORWARD and cufft_INVERSE give me the same result.

Could this be a driver problem? I mean, my attempt to update the drivers before seemed to get rid of plan and internal cuda errors.

EDIT: Ok, so I installed FFTW3 and it gives the same results as Cuda’s FFT. So I’m not really sure what’s going on.

Do Numpy and Scipy do something differently?
Are my arrays not done right?

Thanks for any help.
cudatest.txt (6.57 KB)