Unknow error rtcore error m_api:pipelineSetStackSize: returned[6] invalid stack size

bingx_cg · October 6, 2021, 4:46am

Hi,
I was running optix 6.5 on my Ubuntu (16.04, driver 460, cuda 10.1). I found that when I increase the complexity of device functions to some level (actually not that complex, far from what we usually code in C++ function, but can be a little more complex than the usual RT_PROGRAM) , I got this error.

I checked with most of the setStackSize related forum links which didn’t solve my problem here.
Here is a simple test I tried:
Say I have two functions A() and B() which have the same level of complexity (the same number of local variables and same function calls ).
I commented out either A() or B(), there’s no problem. (even when I manually setStatckSize(256) which is a small number).
However when I have both A() and B() running in serial, then I got this error.
So I guess I can be sure that the problem is not because I run out of stack size.
Could you enlighten me where can this possibly go wrong? Does it have something to do with the OptiX compiler?
I’ve been solving this for a long time. And will it help if I use NSight to debug for this (even when I don’t have OptiX source code)?

The detail of the error is below:

rtContextLaunch2D(RTcontext, unsigned int, RTsize, RTsize)” caught exception: Encountered a rtcore error: m_api.pipelineSetStackSize(pipeline, directCallableStackSizeFromTraversal, directCallableStackSizeFromState, continuationStackSize, maxTraversableGraphDepth) returned (6): Invalid stack size. Segmentation fault (core dumped)

And by the way, the same code can run on my windows 10 without this problem (with driver 471 and the same OptiX 6.5 & cuda 10.1) which is wired.

Thanks!

droettger · October 6, 2021, 7:50am

Right, the problem is that the functions to set the pipeline’s stack size have been changed in OptiX 6 versions to a simpler mechanism where you only specify the number of recursions.

Please always search the online OptiX Programming Guides for the function you have issues with.
In this case the information about the stack size is explained in this OptiX 6.5.0 programming guide chapter:
https://raytracing-docs.nvidia.com/optix6/guide_6_5/index.html#host#global-state

Instead of setting stack size directly, in RTX mode the stack size is estimated using maximum recursion depth values.
Function rtContextSetMaxTraceDepth is used for specifying the maximum trace recursion depth.
Function rtContextSetMaxCallableProgramDepth sets the maximum call depth of a chain of callable programs.

That it worked was just by chance. There is some minimum stack size OptiX 6 requires and that might have been enough for one of your cases.

Also please read the OptiX Release Notes for all versions between your last and current OptiX version. The link to them is directly below the download button for each individual version. It contains information about such changes. In this case, the release notes for OptiX 6.0.0 mention the API to set the stack size by providing the trace depth.

bingx_cg · October 7, 2021, 8:47pm

Thank you for helping to solve my problem! I got a larger maxCallableProgramDepth, then it worked on my Ubuntu. I looked at the document before and searched for the setStackSize… many times, somehow I missed the setMaxCallableProgramDepth().

A little side note is that I found the error message less informative. It threw out an exception saying invalid stack size at pipelineSetStackSize(), where the actual reason is my program runs out of the maxCallableProgram Depth. And I spent a lot of time to google about this kind of error message but with no luck.
I wonder is there a better way to debug this kind of errors in the future?
Thank you so much!

droettger · October 8, 2021, 6:54am

Yeah, it would have been helpful if the former rtContextSetStackSize function returned a warning that this has no effect in that mode anymore.
You’re not the only one who hit that issue, as you’ll find when searching this sub-forum.
This was the most recent case: https://forums.developer.nvidia.com/t/scenenet-rgbd-on-rtx-3090-with-optix-6-5-0/186287/2

Note that the actual error message comes from the “rtcore” driver which is a layer deeper than the OptiX API.
If you look into the OptiX 7 API reference you’ll notice that it matches the new optixPipelineSetStackSize() function exactly.

As said in the other thread, OptiX 7 is the much faster, more flexible, and more modern explicit API available since two years. It’s worth porting over to that. Compare the identical examples inside the OptiX SDK and the open-source examples (see sticky posts in this sub-forum) which demonstrate how the same application is implemented with the old and the new API. There are also a lot of threads on this sub-forum which explain that generally and when there were questions about specific details.