lowering path tracing depth penalty

Hi.

I’m currently working on a path tracing project using OptiX 4. The path tracer is using Russian roulette to determine when a path should be terminated, so the depth of each path could vary widely. This can cause one thread to wait for a unnecessary amount of time, for another to finish.

I have done some small manual testing, where i can see the depth can go from a depth of 1 to over 100.

My current hack to get around this problem, is to take more samples on each context launch, resulting in a lower average waiting time per thread, which gives me a factor two or three depending on the amount of samples per launch.

This however also brings the time to finish a single launch up significantly, which in turns kinda freezes my computers GUI (something with a Cuda launch can’t run a the same time as the graphics tasks in windows… (not the point))

What I am thinking to do instead is to:

  1. trace until a maximum of a specified depth and save the tracing information in a buffer. (could be 10)
  2. sort the buffer in running or terminated paths
  3. repeat on all paths that’s still alive.
    (4.) could input new traces into the buffer, to keep it going.

Does anyone have any experience or insight into this approach, if its possible or if it’s even worth the hassle.

Russian roulette can indeed be unfriendly to many GPU path-tracing algorithms. This is particularly true if you get depths of up to 100 (very unusual for most types of scenes).

The work compaction method you mention could possibly work – it is like a simpler version of wavefront path-tracing. However, overcoming the re-launch and sorting overhead is going to be tough. You also need to watch out for your total number of live rays on re-launch falling to such a small number that the GPU is starved for work. Assuming your typical scenes are like most users’, the vast majority of rays will die off by 3-5 bounces or so. This work starving can happen for instance, if you have a single high-depth object (eg, a crystal statue) that takes up a small portion of the screen but requires many bounces.