cuFFT Error on CUDA3.2 Tesla C1060 vs Fermi C2050

I was able to compile my FFT-based CUDA program on the Tesla C1060. I upgraded to CUDA 3.2 since I got a Fermi C2050.

After the upgrade, my code stops working on the Tesla C1060.

My program does a FFT calculation using the fftw library on the host, copies the results to the GPU and then I do a FFT_FORWARD and a FFT_INVERSE on the data. The FFT_INVERSE call gives me an error.

I also used the cufftResult type to get more info on the FFT_INVERSE execution.

// Inverse FFT

	cufftHandle plan_backward;

	CUFFT_SAFE_CALL(cufftPlan2d(&plan_backward, pix1, pix2, CUFFT_C2C));

	int a1 = cufftResult(cufftExecC2C(plan_backward, f3_d, out1_d, CUFFT_INVERSE));

	printf("a1 = %d\n", a1); //gives 9

	//Destroy CUFFT context

	CUFFT_SAFE_CALL(cufftDestroy(plan_backward));

The result a1 is equal to 9, which indicates CUFFT_UNALIGNED_DATA. I am surprised that this code gives correct results when compiling for Fermi C2050 on CUDA 3.2 and Tesla C1060 on CUDA 3.1

I am not sure what happened here:

My executable script for the code on C1060 looks like this:

nvcc -g -G -pg -D_DEBUG -o ../obj/ao76_fft6 ../src/ao76_fft6.cu \

--host-compilation C -arch sm_13 \

--ptxas-options=-v -maxrregcount=32 -use_fast_math \

-I/usr/local/cuda/include \

-L/usr/local/cuda/lib64 -lcufft -lcuda \

-I/$HOME/NVIDIA_GPU_Computing_SDK/C/common/inc/ \

-L/$HOME/NVIDIA_GPU_Computing_SDK/C/lib -lcutil_x86_64 \

-I/usr/include/ -L/usr/lib64/ -lm -lfftw3f

Any suggestions?? Thanks in advance

I was able to compile my FFT-based CUDA program on the Tesla C1060. I upgraded to CUDA 3.2 since I got a Fermi C2050.

After the upgrade, my code stops working on the Tesla C1060.

My program does a FFT calculation using the fftw library on the host, copies the results to the GPU and then I do a FFT_FORWARD and a FFT_INVERSE on the data. The FFT_INVERSE call gives me an error.

I also used the cufftResult type to get more info on the FFT_INVERSE execution.

// Inverse FFT

	cufftHandle plan_backward;

	CUFFT_SAFE_CALL(cufftPlan2d(&plan_backward, pix1, pix2, CUFFT_C2C));

	int a1 = cufftResult(cufftExecC2C(plan_backward, f3_d, out1_d, CUFFT_INVERSE));

	printf("a1 = %d\n", a1); //gives 9

	//Destroy CUFFT context

	CUFFT_SAFE_CALL(cufftDestroy(plan_backward));

The result a1 is equal to 9, which indicates CUFFT_UNALIGNED_DATA. I am surprised that this code gives correct results when compiling for Fermi C2050 on CUDA 3.2 and Tesla C1060 on CUDA 3.1

I am not sure what happened here:

My executable script for the code on C1060 looks like this:

nvcc -g -G -pg -D_DEBUG -o ../obj/ao76_fft6 ../src/ao76_fft6.cu \

--host-compilation C -arch sm_13 \

--ptxas-options=-v -maxrregcount=32 -use_fast_math \

-I/usr/local/cuda/include \

-L/usr/local/cuda/lib64 -lcufft -lcuda \

-I/$HOME/NVIDIA_GPU_Computing_SDK/C/common/inc/ \

-L/$HOME/NVIDIA_GPU_Computing_SDK/C/lib -lcutil_x86_64 \

-I/usr/include/ -L/usr/lib64/ -lm -lfftw3f

Any suggestions?? Thanks in advance

I also did run CUFFT for a batch size of 50 on both Tesla C1060 and the Fermi C2050.

The inverse FFT fails on the C1060 dues to CUFFT_UNALIGNED_DATA

/*IFFT*/

	cufftHandle plan_backward; 

	/* Create a batched 2D plan */ 

	cufftPlanMany(&plan_backward,2,rank,NULL,1,0,NULL,1,0,CUFFT_C2C,n);

	/* Execute the transform out-of-place */ 

	int a1 = cufftResult(cufftExecC2C(plan_backward, f3_d, out1_d, CUFFT_INVERSE));

	printf("a1 = %d\n", a1);

	/* Destroy the CUFFT plan */ 

	cufftDestroy(plan_backward);

On C1060 a1=9 and C2050 a1=0. Is something different in C1060 for CUDA 3.2 Has anything changed??

I also did run CUFFT for a batch size of 50 on both Tesla C1060 and the Fermi C2050.

The inverse FFT fails on the C1060 dues to CUFFT_UNALIGNED_DATA

/*IFFT*/

	cufftHandle plan_backward; 

	/* Create a batched 2D plan */ 

	cufftPlanMany(&plan_backward,2,rank,NULL,1,0,NULL,1,0,CUFFT_C2C,n);

	/* Execute the transform out-of-place */ 

	int a1 = cufftResult(cufftExecC2C(plan_backward, f3_d, out1_d, CUFFT_INVERSE));

	printf("a1 = %d\n", a1);

	/* Destroy the CUFFT plan */ 

	cufftDestroy(plan_backward);

On C1060 a1=9 and C2050 a1=0. Is something different in C1060 for CUDA 3.2 Has anything changed??

The CUFFT_UNALIGNED_DATA error code was introduced in CUFFT 3.2 RC1 to allow a more graceful failure scenario in the situation where CUFFT 3.1 would just mysteriously fail without explanation in cufftExec*() when the pointers passed to CUFFT were not 256-byte aligned. CUFFT 3.0 and prior did not have this 256-byte alignment requirement.

Since this 256-byte alignment requirement was clearly far from ideal, CUFFT 3.2 RC2 (which should be released Real Soon Now) fixes the underlying issue, and now the alignment requirements are back down to what they were in the CUFFT 3.0 days, so you should no longer get the CUFFT_UNALIGNED_DATA error code with the RC2 once it is released.

Thanks,

Cliff

The CUFFT_UNALIGNED_DATA error code was introduced in CUFFT 3.2 RC1 to allow a more graceful failure scenario in the situation where CUFFT 3.1 would just mysteriously fail without explanation in cufftExec*() when the pointers passed to CUFFT were not 256-byte aligned. CUFFT 3.0 and prior did not have this 256-byte alignment requirement.

Since this 256-byte alignment requirement was clearly far from ideal, CUFFT 3.2 RC2 (which should be released Real Soon Now) fixes the underlying issue, and now the alignment requirements are back down to what they were in the CUFFT 3.0 days, so you should no longer get the CUFFT_UNALIGNED_DATA error code with the RC2 once it is released.

Thanks,

Cliff