Optix denoiser compute device options

OptiX 7 has no knowledge of multiple devices!

That’s finally completely under the developer’s control and happens all inside the CUDA host code of your application.

Means you normally create a CUDA context per device, and an OptiX context per CUDA context and these are completely independent from OptiX’ point of view. Everything which should happen between boards is pure CUDA code.

If you do exactly the same OptiX API calls on both contexts, there will be different kernels for the heterogeneous devices, acceleration structures will be different (incompatilble, cannot be relocated from one to the other device in this case), and probably some more things.

One of my OptiX 7 examples does that, though that has not been tested with a heterogeneous GPU setup.
The preferred setup for multi-GPU should be same board types and best with NVLINK connection.

I would expect that the only rendering distribution strategies which work with that are obviously the single-GPU one and possibly the multi-GPU zero copy (pinned memory) strategy.
All other rendering distribution strategies implemented there so far require copies between the two devices for final display and I do not know if that works with a heterogeneous GPU setup the way I implemented it.
Link here: https://forums.developer.nvidia.com/t/optix-advanced-samples-on-github/48410/4

That example also contains two methods (one for Windows and one for Windows and Linux) to figure out which device is the primary OpenGL device to make CUDA-OpenGL interop working.

Anyway, when handling these two devices separately you can distribute the work as you like, which esp. in a heterogeneous setup would require to do some load balancing to make sure the slower board doesn’t bottleneck the rendering speed.

Also the denoiser will run differently, as said the RTX will use the Tensor cores, the Pascal obviously not.

Here I would try to run the denoiser only on the faster board on the full image.
That would be simpler than denoising two tiles of the image, one on either board, which requires an overlap area (query with optixDenoiserComputeMemoryResources) between tiles, and then you still need to get the results to the final full image anyway.

If you’re not actually rendering with OptiX but only want to apply the denoiser, then I would recommend to not use two devices at all, just pick the faster one.

Read this chapter: https://raytracing-docs.nvidia.com/optix7/guide/index.html#ai_denoiser#nvidia-ai-denoiser

(OptiX 6 would not allow multi-GPU on your configuration. It will pick all GPUs with the highest compatible SM versions, which would be the RTX board in your setup. It’s also either all boards with RT cores or none.)

1 Like