I was able to compile my FFT-based CUDA program on the Tesla C1060. I upgraded to CUDA 3.2 since I got a Fermi C2050.

After the upgrade, my code stops working on the Tesla C1060.

My program does a FFT calculation using the fftw library on the host, copies the results to the GPU and then I do a FFT_FORWARD and a FFT_INVERSE on the data. The FFT_INVERSE call gives me an error.

I also used the cufftResult type to get more info on the FFT_INVERSE execution.

```
// Inverse FFT
cufftHandle plan_backward;
CUFFT_SAFE_CALL(cufftPlan2d(&plan_backward, pix1, pix2, CUFFT_C2C));
int a1 = cufftResult(cufftExecC2C(plan_backward, f3_d, out1_d, CUFFT_INVERSE));
printf("a1 = %d\n", a1); //gives 9
//Destroy CUFFT context
CUFFT_SAFE_CALL(cufftDestroy(plan_backward));
```

The result a1 is equal to 9, which indicates CUFFT_UNALIGNED_DATA. I am surprised that this code gives correct results when compiling for Fermi C2050 on CUDA 3.2 and Tesla C1060 on CUDA 3.1

I am not sure what happened here:

My executable script for the code on C1060 looks like this:

```
nvcc -g -G -pg -D_DEBUG -o ../obj/ao76_fft6 ../src/ao76_fft6.cu \
--host-compilation C -arch sm_13 \
--ptxas-options=-v -maxrregcount=32 -use_fast_math \
-I/usr/local/cuda/include \
-L/usr/local/cuda/lib64 -lcufft -lcuda \
-I/$HOME/NVIDIA_GPU_Computing_SDK/C/common/inc/ \
-L/$HOME/NVIDIA_GPU_Computing_SDK/C/lib -lcutil_x86_64 \
-I/usr/include/ -L/usr/lib64/ -lm -lfftw3f
```

Any suggestions?? Thanks in advance