Progressive photon mapping sample with multiple GPUs

One more question:

Is that safe to change “gather_buffer” to be RT_BUFFER_GPU_LOCAL?

Changing to RT_BUFFER_GPU_LOCAL improves gather pass from 0.018s to 0.008s on my two GPUs setup. It seems working, but I am not sure if it is by luck. “gather_buffer” is used by different passes, so it should be only working if Optix distributes same area of “gather_buffer” for each GPU and for each launch.

In fact, any “accumulation” like buffers could get benefits for such use case. That even if “writes from multiple devices are not coherent, as a separate copy of the buffer resides on each device”, as long as Optix supports non-random access of local buffer across different optix launch, we can avoid copying back to host.

This could be a quite useful feature.