I have a small device recursion function, which is running fine when built in “Release”.
But when run with “Debug”, CUDA will complain “an illegal instruction was encountered”
NSight debug points me to this recursion and saying “CUDA detected data stack overflow”. After removing the recursion function, it’s working fine again.
So the questions is: whether CUDA support recursion properly? For my case, why only “Release” is working fine?
As on any architecture (including x86) recursion in CUDA requires stack space. Recurse too deeply and the app will overflow the available stack space, causing abnormal termination. It is entirely possible that a debug build requires more stack space per function call than a release build, as pretty much all compiler optimizations are disabled for debug builds. Try increasing the per-thread stack space. See this thread on Stackoverflow: