StackOverflow in OptiX 6

Hi, I’m working with OptiX 6 in a Quadro P6000, but no matter how big I set my StackSize (I disabled RTX mode), I keep getting StackOverflow Exceptions. Checking memory consumption of GPU through nvidia-smi I only use around 320MB. In my program I trace 512 rays, and on hit I retrace either refraction OR reflection ray within my closest hit program. I also have some static device functions that I call from my CH program, and my stack size is set to around 2MB. Is there anything I’m missing or misunderstanding here?

Thanks!

Hi @tperezc,

Stack size on the GPU is extremely limited compared to the CPU, so it’s a problem to retrace from your closest hit program. You need to do one of two things, either limit your maximum recursion depth to something that will fit within your stack, or do your re-trace from your raygen program instead of your closest hit program. The latter (using the raygen program) is the preferred solution since that way you don’t consume extra stack frames and you don’t necessarily have to set a maximum depth for your ray paths (though it may still be a good idea). Take a look at the optixPathTracer sample for an example of how to re-trace in the raygen program instead of closest hit.


David.

Which OptiX 6 version exactly? If that’s 6.0.0, have you tried 6.5.0?
What’s your display driver version?

“In my program I trace 512 rays, and on hit I retrace either refraction OR reflection ray within my closest hit program.”

Are you saying that your launch dimension is 512 (then it’s much too small to saturate the GPU) and you have at maximum the primary ray and one of two continuation rays? Or is that recursive and each new closest hit event on a transparent material keeps splitting?
If that is recursive, could it be that some code paths don’t have the proper end condition and call too deep and always exceed the stack space?

You could add printf() with the current ray depth to one of the failing launch indices to track the recursion depth or add an rtThrow() and trigger a user exception before every rtTrace() call in your code if the recursion depth is deeper than you expected.

A 2 MB stack size is huge. This is per thread. If the recursion depth was the error, please determine the actual minimum stack size as described here:
[url]Code work well in Optix 3.9.1 , but fail in Optix 4.0.2 - OptiX - NVIDIA Developer Forums

EDIT: More on what David described, including links to examples which handle glass in a path tracer: [url]https://devtalk.nvidia.com/default/topic/1063643/optix/samples-about-secondary-trace-call-in-rg/post/5386165[/url]

First of all, thanks David, I’ll check that example and move my retrace to raygen program.

Second, I’m using the 6.0.0 version, not sure about the display driver version (not at that PC at the moment).

You’re correct, I have a launch dimension of 512 (it’s a 1D launch). On my CH program I check if depth and importance are greater than my thresholds for each, and trace the refraction or reflection (one or the other, I determine this with a random number). This behaviour is similar to Figure 2b in this paper: [url]http://www.vision.ee.ethz.ch/~ogoksel/pre/Mattausch_realistic_17pre.pdf[/url], I’m just not doing Montecarlo yet.

I’ll try and get those printf() and post again, and also check those two links.

Thank you both for your responses!

That’s effectively a Monte Carlo path tracer then. It’s rather simple to convert that to an iterative implementation and then the stack space shouldn’t be an issue anymore.

There is no need to shoot the continuation ray from the closest hit program as it would be natural for the deterministic full tree in the image 2a.

“and trace the refraction or reflection (one or the other, I determine this with a random number). …I’m just not doing Montecarlo yet.”

That stochastic selection of one or the other direction depending on its probability is the “Monte Carlo” in that algorithm already. I assume you’re just not accumulating all paths to the full result, yet.
All these things are shown step-by-step inside the OptiX Introduction examples.
Sounds like you’re not needing direct lighting (next event estimation) but just a brute force path tracer.
That will be really fast in an iterative implementation. That is also prepared inside these examples.

Yes, I misspoke there, and confused the stochastic selection with the intensities accumulation, sorry!

I actually don’t have any light sources, it effectively is a brute path tracer (I’ll do some ray marching afterwards too)

Thank you both for all the advice!

PD: Sorry if I have some spelling or grammar errors, English is not my main language and it’s been a while since I actually practiced.

You’re welcome. English is also not my native language. I’m mistyping all the time and, boy, do I edit my forum posts after hitting reply sometimes.

The Monte Carlo in quotation marks was not meant to correct anything. I was just trying to emphasize that it’s the random component in the algorithm.
It’s just awesome. Whenever I see an integral in a rendering equation I know that I just need to shoot more rays. ;-)