Questions on launch_index, secondary rays, and threads

Hi folks,

I have an application where my light source fires a number of rays into the scene and I want to log when a certain object in the scene was hit, where it was hit, and some information about the source. The material of this object responds to not only the primary ray but also secondary reflection rays. I’d like to use an rtBuffer to log this information and I was wondering what the standard approach was for managing the buffer access. Many of the examples in the SDK write to an rtBuffer using a launch_index associated with a ray generated at context launch, and this ensures that nothing is clobbered. However, I assume for the case of secondary rays the launch_index is the same as the parent primary ray? In my situation this would lead to clobbering of the buffer values if a primary ray and one of its children reflections both intersected with the object.

If I were to recursively generate secondary rays from a primary ray, would all that work be performed on the same OptiX thread? And if so, is it possible to maintain a “local” variable across the recursion that maintains state just within that thread? What I am essentially looking for is a way to index a 2D buffer containing my data, where perhaps the X dimension is indexed by thread via launch_index and the Y dimension is indexed by a thread-local pointer that is incremented every time the material is hit.


The launch_index is the same, but the clobbering is up to your implementation. You could, for instance, increment the value in your rtBuffer whenever you need to change its value, or you could use a 3D buffer with a different z-index for each secondary ray. Both options would avoid clobbering.

Yes, at least in the current version, they’re all on the same thread. However, to pass values between OptiX programs on that thread, you should use the ray payload. You can add whatever variables you want to that payload.

That’s doable, but you’ll have to specify the X and Y dimensions of the buffer in advance, which will limit the number of hits you can record.

Hi nljones,

Thanks for the insight! Very much appreciated. What I have settled for is storing my counts in an RT_INPUT_OUTPUT buffer of type unsigned that is indexed by the 2D launch index of the thread.

That will work. If you are concerned about performance, this solution could be slower because the buffer is being stored in global memory.

If you don’t ever need to see the contents of the buffer on the CPU, then you could make it a little bit faster by also making the buffer RT_BUFFER_GPU_LOCAL.

Unfortunately I will need to access the data from the host once the device-side Optix program terminates, so this may impact performance. Reading on the subject I’m under the impression that this performance degradation only occurs in multi-GPU environments where the data is always accessed from global host memory I suppose for coherence purposes. Is my understanding of this correct?

Assuming the above is true, if I were to have a separate context (set to trace a subset of the rays from my source) mapped to each GPU would I avoid the performance penalty? Each context would have its own buffer for data collection.

Yes, the performance improvement associated with RT_BUFFER_GPU_LOCAL is mainly for multi-GPU systems. To the best of my knowledge, OptiX doesn’t support multiple contexts. That’s not to say your idea won’t work, just that the developers don’t test it.

However, that’s why I initially suggested storing your count in the ray payload; that would hopefully avoid having to access global memory altogether, although the data you send back to the CPU would still have to go into a buffer in global memory.