GPU Cuda program does not work with recursive calls

Solved with:

stackSize = 511*1024;

err = cudaDeviceSetLimit (cudaLimitStackSize, stackSize);

printf(“cudaDeviceSetLimit returns: %s, size=%d\n”, cudaGetErrorString(err), stackSize);

=====

Hello,

I am trying to port a CPU program (magnetic field modelling) to CUDA GPU.
I use 1 block and 1 thread per block, just to see if it is working.
The flow is: main()->cudaLauncher()->kernel[team22_parallel()]->team22ObjectiveFunction()->evaluate()…
evaluate() calls some functions that will call ABaraDepRec() and BzBaraRec() which are recursive functions (with parameters and return of type double).
The program compiles slow, but with a warning (because of the recursive functions):
“nvlink warning : Stack size for entry function ‘_Z15team22_parallelP13Team22DataSet’ cannot be statically determined”
When I run it, I get “unspecified launch failure”.

The program has functions that use dynamic allocation (malloc) so I had to use -arch=sm_20.
The compiler command is:
nvcc -arch=sm_20 --relocatable-device-code true main.cu elliptic.cu
I run it with:
./a.out

Device 0: “Tesla M2070”. CUDA Driver Version / Runtime Version 5.0 / 5.0.
CUDA Capability Major/Minor version number: 2.0

Please help. I appreciate any idea. Thank you!

main.cu (3.61 KB)
elliptic.cu (18.7 KB)
elliptic.h (2.84 KB)