I have implemented an expression template based library working for both, CPU and GPU arrays, and now I want to add FFT functionalities on GPU by the CUFFT.
I have this code fragment
int len=10;
CudaMatrix<float> A_D(1,len);
A_D = ones<float>(1,len);
// Option 1
cufftHandle plan = DEVICE_FFT_PLAN_C2C(A_D.GetNumElements(),1);
// Option 2
cufftHandle plan;
if (cufftPlan1d(&plan, len, CUFFT_C2C, 1) != CUFFT_SUCCESS){
fprintf(stderr, "CUFFT error: Plan creation failed"); getch();
return 0;
}
// Option 3
cufftHandle plan;
// Option 4
DEVICE_FFT_PLAN_C2C(A_D.GetNumElements(),1);
// Option 5
// No code concerning options 1, 2 or 3
The code snippets under options 1, 2, 3 or 4 are mutually exclusive (they are not used simultaneously). Basically, they just calculate a CUFFT plan. Option 5 means that there is none of the instructions under options 1, 2, 3 or 4.
Options 1, 2 or 3
When I use the code snippets under options 1, 2 or 3, after compilation, this code crashes after 1-2 times I run it. Particularly, the code crashes at the ones instruction (which internally consists of a kernel launch) with unknown error.
Options 4 or 5
In this case, the code does not crash and returns correct results.
It seems that the simple declaration or use of a plan in the heap instead of the stack (option 4) makes the code to crash, although not in a deterministic number of launches. Also, the declaration or usage of the plan leads to an “anti-causal” error on a previous instruction (?).
I’m using Visual Studio 2010 and CUDA 5.0.
Anyone can help with this “obscure” phenomenon?
Thanks.