Is there a way to launch OptiX 7 in CUDA device code?

I wish to implement a CUDA rasterizer and perform ray tracing for each rasterized fragment. There are some details:

  • The input of rasterization and input of raytracing launch (acceleration structure, pipeline, sbt, params, etc.) is already prepared beforehand.
  • I need to perform ray tracing for each fragment without depth test. (So a two-pass method using G-Buffers seems not good.)

So, is there a proper way to launch OptiX raytracing directly in fragment shader of a CUDA-implemented rasterizer?

For example, maybe I can call optixLaunch() with only 1 launch width/height/depth in host code, and implement a rasterizer in ray_gen function at a cost of very poor thread utilization.

Hi @yetsun,

There isn’t a way to launch OptiX from a fragment shader or in CUDA. There is only the host-side ray tracing kernel launch triggered using optixLaunch().

For example, maybe I can call optixLaunch() with only 1 launch width/height/depth in host code, and implement a rasterizer in ray_gen function at a cost of very poor thread utilization.

That is possible, yes, but it would be extremely slow and not worth doing. It would be much faster to use a two pass method, or to use ray tracing for your primary rays. Is using raytracing instead of the rasterizer an option?

I need to perform ray tracing for each fragment without depth test. (So a two-pass method using G-Buffers seems not good.)

Maybe I don’t understand your question completely, what do you mean exactly by “without a depth test”? I’m curious why you’re thinking a 2-pass with G-buffer might not be very good? Based on your question so far, I was going to suggest a 2-pass render using a G-buffer.


David.

Hi @David,

What I originally want is to perform a ray tracing for each rasterized fragment no matter what Z-depth it is. That is, a pixel can have many fragments doing ray tracing (so the size of the output buffer is unknown beforehand).

A 2-pass rendering with a dynamic output buffer or a pre-allocated large output buffer is also an option for me, even though a more flexible use of ray tracing is interesting.

Thanks for your helpful reply.

That makes sense, it’s tougher to allocate a buffer if you don’t know the size in advance. It is possible though, depending on your pain tolerance. :)

The easiest approach would be to pre-allocate for a maximum number of fragments per pixel, and if you need do a multi-pass rendering to catch the overflow fragments in a 2nd or 3rd launch. Some people also get trickier and fill a dynamic buffer with fragment info in arbitrary order, use a 1D optix launch, and have either a buffer mapping launch index to pixel, or carry the pixel information (or whatever is needed to construct a ray) in the G-buffer.

Good luck! Tracing directly from CUDA kernels isn’t possible right now, but it might be in the future, we are absolutely thinking about how to make ray tracing more flexible. In the mean time, we are certainly interested to hear how your project goes and what ends up being the most difficult parts, so we can continue to improve it.


David.