cuFFT fails in CUDA/OpenGL Interoperability

I was trying to link a texture buffer in openGL with a cuda array which allows me to pass
my calculated results to openGL to be displayed later on the screen.
However, I found the cuFFT is paralyzed after
cudaGraphicsGLRegisterImage(&… , …, GL_TEXTURE_2D, cudaGraphicsRegisterFlagsNone) is applied.

Not sure if it is a driver conflict / library conflict or any other possible mistake.
Does any one have a clue?
(I debugged my code line by line to find out cudaGraphicsGLRegisterImage is the line paralyze cuFFT.)

FYI I am using CUDA 5.0 / openGL 4.3on Nvidia GTX 460 SE.

I found in some situation cuFFT will fail, and it might be related.

If any cuda memory is located(defined) before the calling “cudaGLSetGLDevice”, then cuFFT will fail.

If any incorrect format transferring is conducting in the kernel, cuFFT will fail.
EX: cudaMemcpy(a,b,size,type), a and b are not the same type, or size doesn’t fit. In this case, the code can still be complied and no error, just with incorrect results and failed cuFFT comment.

I think my binding texture might have something related to the format thing, so that is why it causes cuFFT failed.