stackSize = 511*1024;
err = cudaDeviceSetLimit (cudaLimitStackSize, stackSize);
printf(“cudaDeviceSetLimit returns: %s, size=%d\n”, cudaGetErrorString(err), stackSize);
I am trying to port a CPU program (magnetic field modelling) to CUDA GPU.
I use 1 block and 1 thread per block, just to see if it is working.
The flow is: main()->cudaLauncher()->kernel[team22_parallel()]->team22ObjectiveFunction()->evaluate()…
evaluate() calls some functions that will call ABaraDepRec() and BzBaraRec() which are recursive functions (with parameters and return of type double).
The program compiles slow, but with a warning (because of the recursive functions):
“nvlink warning : Stack size for entry function ‘_Z15team22_parallelP13Team22DataSet’ cannot be statically determined”
When I run it, I get “unspecified launch failure”.
The program has functions that use dynamic allocation (malloc) so I had to use -arch=sm_20.
The compiler command is:
nvcc -arch=sm_20 --relocatable-device-code true main.cu elliptic.cu
I run it with:
Device 0: “Tesla M2070”. CUDA Driver Version / Runtime Version 5.0 / 5.0.
CUDA Capability Major/Minor version number: 2.0
Please help. I appreciate any idea. Thank you!