Problem with 3D CUFFT

I did a simple test, in which a 3D cufftExecD2Z() is invoked for a number of times. When the input data is of size 323232 or larger, it always hangs at a certain kernel invocation. Has anyone seen similar problems?