Register usage in OptiX 6.0 kernels

starkr · December 20, 2019, 5:44pm

Greetings! As the title suggests, I am using OptiX 6.0, GeForce RTX 2060, 436.30 driver.

I’ve been tasked to optimize our OptiX kernels. Nsight says that our GPU utilization is fairly low, so I wanted to approach that from a few angles. Particularly, I’ve noticed that our kernels declare a host of local variables, so I figured that low GPU utilization may partially be due to the shortage of registers. Disclaimer: I wasn’t able to verify that with Nsight, because I couldn’t find our OptiX kernel in the list of all kernels. I remember that with Optix 5.0 it was labeled with “MegaKernelN”, but I am not sure about Optix 6.0.

p.16 at http://on-demand.gputechconf.com/gtc/2013/presentations/S3475-Ray-Tracking-With-OptiX.pdf states that “when working set of registers is too large, registers are stored to local memory”. Does that mean that when OptiX kernels are compiled, OptiX automatically performs this optimization? Or do you think that moving the aforementioned plethora of local variables to some local memory could alleviate the issue of having a shortage of registers (I assume not, if OptiX automatically moves them to local memory, if I understood that correctly)?

I would greatly appreciate pointers in the right direction. Thank you for your time!

dhart · December 20, 2019, 10:21pm

Hi starkr,

What the documentation means is that code is compiled to spill registers into memory around function calls, or any time the register usage overflows the number of available registers. This is done at compile time, and the same is true of a CUDA program or a host side CPU program too, registers are frequently stored to and retrieved from memory.

It can be difficult to reduce register usage by moving local variables around, since the compiler is deciding the register usage for you. Here are a few strategies for reducing register usage:

Keep the scope of your variable declarations and references as tight/small as possible
Remove variables from your code, if you can see ways to do so.
Reduce the size of data/variables, if you can. Using floats instead of doubles or half floats instead of full floats will free up a register for every 32 bit value you can save.
Reduce the size of your payload & attributes in your OptiX programs.
Look for places that compute a local variable before optixTrace() or a callable program, and also refer to that variable after the call. This will often cause the variable to need to be saved to memory and then restored after the call. optixTrace & callable programs are not inlined, so each trace call or callable program call will need a big pile of registers for themselves. Sometimes it’s better to re-compute simple expressions after a trace/callable rather than hold onto a variable.

I do recommend trying to get Nsight Compute to work, it will really help understand register usage, and it may also uncover other reasons for the low utilization not related to register usage. I’m not at all sure why it’s not working for you right now, but I can recommend two things to try: upgrade to the latest driver, and use the OptiX 6.5 SDK.

Also, I think it’s probably more common for memory usage to be a bottleneck than register usage, so even without Nsight Compute, I would suggest defaulting to looking for ways to reduce memory bandwidth as the first angle of attack. (And note especially that if this is your problem, then trying to move registers into memory could make your problem worse.) There are other common reasons for low utilization including low scene/BVH coherence and shader divergence, so think about whether those might be the reasons too.

–
David.

Topic		Replies	Views
Use of register An odd problem CUDA Programming and Performance	12	2314	August 12, 2010
How to understand and optimize register usage for Optix MegaKernels ? OptiX	1	670	June 14, 2022
how to reduce registers in each kernel CUDA Programming and Performance	2	1131	November 4, 2009
Weird use of registers Too many registers are wasted CUDA Programming and Performance	8	5497	July 4, 2007
Is it possible to use more than 124 registers in kernel? CUDA Programming and Performance	15	4159	October 16, 2009
How to care registers? CUDA Programming and Performance	5	2994	July 8, 2009
Incomprehendible register usage, once again CUDA Programming and Performance	3	1975	February 5, 2009
Reducing register usage CUDA Programming and Performance	1	1133	October 3, 2009
reducing the number of used registers CUDA Programming and Performance	8	6331	September 22, 2009
Better control of register use CUDA Programming and Performance	4	1889	July 1, 2009

Register usage in OptiX 6.0 kernels

Related topics