Optix scheduling / dynamic memory management?


I’m kinda new to optix, so, first, sorry if this is covered elsewher, I might not have the keywords to find it.

I’m trying to recreate the general architecture of http://on-demand.gputechconf.com/gtc/2014/presentations/S4359-rt-em-wave-propagation-optix-sims-car-to-car-communication.pdf in optix.

The general idea is shooting millions of rays in a scene, and determining which rays reach a certain object (a sphere, representing an antenna) after diffractions, reflections, diffractions.

The authors use “dynamic memory management”:

Dynamic memory management:
• Allocate a global buffer for all threads;
• When a path needs to be stored, atomic operations ensure serialized buffer access;

Is there a general blueprint to implement a kind of “priority queue” in order to have threads starting new rays when they are finished (for example after a miss), and enabling them to schedule new rays as well (after a diffraction for example) ?

Welcom remy5t3km,

Generally speaking we don’t currently have a blueprint for a ray priority queue that we can offer. It depends pretty heavily on the details of your renderer, and for best performance you might want to avoid a queueing system anyway. The slides you reference are using a queueing system for specific reasons that might not apply unless your situation is very, very similar to the authors’.

To be clear, these slides are talking about using a global buffer and atomic operations to store valid path data once found, but they aren’t using this as a way to schedule new rays, as far as I can tell from the slides. The stored paths are passed on to their CUDA kernels for processing. The part that is scheduling new rays for traversal after a diffraction event is the iterative raygen loop discussed on page 11. They are generating full paths including diffractions & reflections using their OptiX programs, and then once a full valid path is found, it is put into the global buffer for de-duping & EM calculations. It sounds like once a path goes into that buffer, it doesn’t come back to OptiX.

OptiX automatically starts new rays after a miss, and your shader programs can schedule new rays to launch after a diffraction event. The iterative vs recursive topic in these slides is referring to which of your OptiX programs you put your optixTrace() call into. In a recursive renderer, the optixTrace() call for diffracted rays is in your closest hit program, and for an iterative renderer, the optixTrace() call is in your raygen shader.

So I’m not sure, but it sounds like it might be the iterative vs recursive approach that you’re most interested in? Also your question combined with these slides sounds like you might be interested in looking at what is called a “wavefront” approach. The idea there is to organize you rendering tasks into separate launches. You might do one OptiX launch to process primary rays, and store all the hit points into a buffer, then do another launch to cast all the reflection rays & store the result into a buffer, and finally a third launch can take the previous buffers and shade all the points, storing the final result into an image buffer. That’s only a limited example, sometimes wavefront renderers can have dozens of phases. You typically do use global buffers to pass memory between wavefront passes, but you don’t necessarily need to use any atomic operations unless you need some kind of reduction (like when your launch size is too large to store one result for every thread.)

I hope that helps. Let me know if I misunderstood your question and went the wrong direction.



Many thanks for the detailed answer.

My situation is very similar to the authors one: I want to use raytracing for radio wave propagation simulations. On this matter, if some code already exists, I’d probably use it.

I’m not sure I got this part: “for an iterative renderer, the optixTrace() call is in your raygen shader.”
That might be my fault, as I don’t really get what is a “raygen shader”.

The wavefront approach sounds interesting, from a high level optimisation point-of-view (and it sounds similar to the authors “ordering” of the rays tracing in order to avoid “splits” in the execution flow across the cores).
So, to transmit data across launches, I need a buffer. My understanding is that this buffer is limited by the memory of the GPU, from the page 10 of the presentation: “4 GB GPU memory: Only 2500 rays, but we rather need millions!”.

So, if I want a lot of rays, I’ll still need to have some way to move the data back to the host, “schedule” new rays, and then when I finished my first batch, move onto the next one? Or am I wrong at some point?

Thanks again,


Hi Rémy,
it seems that you want to do something similar to what I am already doing. You can try our code, which is public and with MIT license. It supports reflections and depolarization, and simple diffraction is in progress:

If you find it useful, please cite our papers.
Kind regards,

Hi Esteban,

that’s perfect! The paper is quite recent, and that’s probably I missed it. Thanks for this work, I’ll be sure to cite your papers if I use it!

Just to be sure: are reflections still the only kind of rays implemented? (no transmission? no diffraction?).


Just reflections and penetration (transmission). Full diffraction, with reflections+diffractions requires to think it carefully to be manageable. With multiple phases as dhart said. You can try your own implementation. I am working on a simple diffraction (just direct diffraction) solution at the moment. Any ideas are welcome.