Why I can't add an non-zero number in the kernel?

In a kernel I want add two nunbers as followed:

double ttd=0.0;




(if ttd=0.0, erveryting work well; if I set ttd=1, the cufft will throw an error as followed:

cufft: ERROR: /root/cuda-stuff/sw/rel/gpgpu/toolkit/r2.2/cufft/src/cufft.cu, line 140


error: FFT Execute failed :

error: isign = 1

error: (N1, N2, N3) is (1, 224, 224)

cufft: ERROR: /root/cuda-stuff/sw/rel/gpgpu/toolkit/r2.2/cufft/src/plan.cu, line 73


Who know why? can you tell me some possible reasons?

By the way, I didn’t run cufft in this function. I had found wherever there is an error, the cufft will throw the error. So I had used the cufft as a debug tool because the deviceemu and cugdb don’t work in my mutiGPU and multiThread program but the cufft always work.

This is some sort of elaborate joke, right?

I had found the reason. It is the id exceed the border of the array. If adding a zero, the data out of the border wasn’t change and no error would be reported. If adding non-zero unmber, the result is obvious.

But i am sure that cufft is always a good debug tool to find such error in my code. It is very funny!

So I hope someone can improve the cufft and make it more stable. I prefer the cufft to be a fft tool but a debug tool.

The zero case is undoubtedly never compiled into the executable kernel code - open64 has a very rigorous dead code removal algorithm.

It isn’t funny, it is just plain ignorant. The CUDA API includes a function cudaGetLastError() which will return the actual error from CUDA, as opposed to some filtered through CUFFT version of it. There are also API functions to turn the error codes into human readable form. If you ever get around to reading the documentation for CUDA you will find that and many other interesting features.