Handling long kernels, out of stack etc... errors

Hello guys!

I would like to ask for a general advice, how to handle very long kernels, and out of stack memory situations (when using heavily recursive code).

So to better describe my problem, I have been working on porting an optix renderer to cuda, so we could have access some advanced cuda features (I`m extensively using most of the cuda 4.0 features), and to get around some serious limitations of the optix sdk (shader linking without recompiling, and not generating the code myself, long compile times etc), but what I am really missing, is the error handling from optix. It was great to have exceptions to handle out of stack situations, and it was much harder to write such programs that runs into the windows limitation of kernel runtimes.

What is a good way to handle these errors in cuda? I can always decrease thread count (each thread calculates a ray tree, and I`m rendering a simple bucket with each kernel launch), but that increases render times, as the occupancy decreases (and In the end it would faster to use a well tuned cpu tracing code…)… Is there any way I can cancel a kernel, to get around this? (callbacks etc…)

And about the stack trace… I dont want to use a similar stack handling to optix, since it would painfully complicate the code... (and I dont have the resources to write something like that) And its kinda hard to guess how much stack each rendering requires, since its required to support shader trees (not only the artists are sometimes creating quite big ones, but writing one big uber shader is not really an option, due to the high number of possible variations). Is there any option to track stack usage? What do you think about this?

Best Regards, Pal.