OptiX Stack Size

steven_mx · January 17, 2013, 9:15pm

Hi,

I was wondering whether it is possible to set the stack size per kernel instead of globally per context. Within a single context, some programs would be much more complex than others and the less complex ones would gain significant performance advantages from a smaller stack size. I know it is possible to set the stack size before launching a program but this seems to be triggering a very slow recompile.

Thanks,

Steven

bwolfers · January 21, 2013, 8:04am

It is not possible to set the stack size per kernel, and, in general, it is not expected that this would produce a significant performance gain.

steven_mx · January 22, 2013, 10:39pm

Hi bwolfers,

Thanks for your input - that’s what I thought :(

Re performance, yes it makes a difference. I have 2 OptiX entry points, one of them needs 500 bytes stack size and the other one runs comfortably on 300 bytes. Since I can only define one global stack size per context, I have to set the stack size at 500. The smaller kernel takes 22ms when I set the stack size to 500 and just 4ms when the stack size is at 300.

Unfortunately I switch between the 2 kernels often and cannot call setStackSize in between programs since it triggers a recompile.

It would be extremely helpful if OptiX allowed setting a stack size per entry point. Conceptually this shouldn’t be a problem to implement since each entry point runs completely independent.

I would appreciate any input you may have on the issue.

Thanks,

Steven

JBigler · February 8, 2013, 6:01pm

We could add a per entry point stack size, and that would solve the recompilations, but there’s another blocker for this to work properly.

Currently we use a flag on the CUDA context that tells it to not resize the LMEM after each launch (CU_CTX_LMEM_RESIZE_TO_MAX). Thus once you called the 500 byte kernel, the LMEM would be resize to 500 for ever more. There are ways to avoid this, but they present additional performance issues that are still being addressed. We are currently working to try and remove the need to use this flag, and at that point a discussion of per entry point stack could be made.

That being said, I haven’t found any situations where the stack size affected performance as it has your kernel. If it’s possible, we would like to obtain a trace of your code to investigate this further when we have resolved the issues with LMEM resizing. Please contact us at optix-help@nvidia.com if you would like to share a trace with us.

m001 · January 19, 2018, 4:55am

removed