Non-blocking rendering and overlapped launching

I am using optix along side another library to render scenes that are simulated with the 3rd party library. I want to have the computation required by the ray tracing to overlap with the computation of the simulation. In addition, I have multiple cameras in optix, but from everything I have found, the launch call is blocking which means my cameras are run sequentially.

I looked into the progressive launch call, but I need to access the output from the render as a device pointer to a float4 buffer. From what I was reading (and tried), this combination of functionality is not allowed.

My render loop looks somthing like this:


Any suggestions as to how to go about running and launching this in an efficient manner? Thank you!

Hi @amelmquist,

That’s right, currently the OptiX 6 launch is a synchronous call from the calling thread. We actually have a couple of different solutions in the works to allow asynchronous OptiX launches, but they aren’t quite ready yet, so I’ll try to make some suggestions for today.

You can probably do what you want today using OptiX-CUDA interop. (

What I would suggest is to make simulateNewFrame() launch a CUDA kernel asynchronously in a 2nd separate CUDA stream, and call it before renderPreviousFrame(), that way they will both do work until renderPreviousFrame() returns. You will then need to wait for simulateNewFrame() to finish, and at that point do a little bit of the CUDA-interop part which means wiring up the results of your simulation to the inputs of the next loop’s renderPreviousFrame() call. I’m guessing you will probably need to alternate between two simulation buffers, so you can write to one while you’re reading from the other.

Some minor caveats that may or may not come up… you might have to use OptiX headers to get access to the CUDA vector types, via the OptiX namespace. You will probably have to put your OptiX shaders and your CUDA simulation kernel in different .cu files as they’ll be compiled differently. I think you can do this in one thread, but I’m not absolutely certain, so there’s a chance you might end up needing one thread for CUDA work, and one thread for OptiX work, and in that case you’d have to do some manual host-side synchronization between your threads each time through the loop.