Using two GTX 1080 Ti is much slower than one GTX 1080 Ti with the PPM in Optix Advanced Samples. I would like to know how to benefit from multiple GPUs in such a use case, more generally, the use cases that have multiple passes + bandwidth hungry.
Starting photon pass … finished. 0.00374957
Starting kd_tree build … finished. 0.007299
Starting gather pass … finished. 0.00669374
Starting photon pass … finished. 0.00798878
Starting kd_tree build … finished. 0.00505487
Starting gather pass … finished. 0.0188994
It seems a major bottle neck is the output buffers that multiple GPUs are writing into. What if we could make photon maps completely stay in each GPU locally? In gather pass, each GPU just read its own photon map. With applying a kd tree construction on each GPU independently, the whole photon map construction could be duplicated on each GPU to avoid writing to PCIE. I am not sure how Optix 5.0 could do this right now, it could be achieved with two features:
- Optix launch allows GPUs to write their local buffers instead of just in cooperation mode(same output)
2, Optix launch allows GPUs read from corresponding local buffers, for example making variable “rtBufferLocal<> photon_map”, when writing to a final output, GPUs could still in cooperation mode, but they read their own photon_map in local memory. Of course rtBufferLocals are not automatically synced between GPUs.
Any suggestions are welcome,