New to Optix, and for my application I might be looking at casting , say, 100M rays with up to 500 reflections ( this is not a visualization application). Any best practices for this kind of use. For example, finding the optimum “chunk-size” of rays.
Were you thinking of using OptiX only for doing rounds of intersection (500 launches), or for full shading where you generate reflection rays (1 launch with 500 bounces)?
For the latter case, where you’re doing everything in one kernel launch, preserving coherence might be a challenge for 500 bounces. Can you set up your scene as a single shader and a single mesh? Also, is it a closed environment, or can some rays escape and die out before they reach bounce 500?
Hi, sorry for the delayed reply.
Essentially the algorithm is
for i=1 to nReflections
for j=1 to nRays
Cast (launch) the ray
If the ray does not escape then
update ray origin to hit point, direction to reflected direction
There are two ways (at least) to use OptiX for this:
- Launch all the rays (or at least as many as will fit in memory) and do the “for i = 1 to nReflections” loop entirely inside of an OptiX ray-gen program, all in one launch. OptiX does the “for j=1 to nRays” for you, and exposes “j” as the launch index. You would implement the reflection part either directly in ray-gen if it’s simple enough, or in a closest hit program. For reference, see the optixPathTracer SDK sample, which probably does more shading than you need.
After 500 reflections I think the main concern is rays becoming very incoherent in terms of origin or direction, or so many rays escaping that only a handful are left active. If so, you are likely to see a “tail effect” where the gpu is not very busy.
As a practical consideration, if this runs for more than a few seconds then you may have issues with timeouts in Windows, and it would be better if the GPU doing the ray tracing is not driving the display. Search around on the forum for tips on this from Detlef. On linux, putting the GPU in the non-display slot is enough.
- Launch a “wavefront” (1 bounce at a time) of rays with OptiX, calculate new origins and directions, but then run an external CUDA kernel for workload adjustment (sorting rays, culling rays that escape, …). Form a new batch and pass that back to OptiX. CUDA interop via buffers is relatively easy to set up. For reference, see the optixRaycasting SDK sample, which shows primary rays only. You would need to add bounces to this.
I would implement (1) as a reference solution first in any case because it’s easier.
As far as choosing chunk size for (2), if all the rays fit in memory, they why not a single chunk. Is there an advantage to smaller chunks other than reduced memory use?
OK, thanks. Lots to think about. 500 reflections is a slightly perverse edge-case. It is used in an enclosed space to simulate a diffuse field.
Enclosed space meaning no rays escape? In that case 500 bounces might still be ok. The case to avoid if you can is a handful of rays left alive while the rest of the gpu does nothing.
Yes, no rays escape in the closed box scenario.
Typically having 500 reflections in any kind of open space is pointless as nearly all rays will escape well before 500 reflections.