I’m trying to run a “optixDenoiser” sample from latest Optix SDK on an older dual-GPU GTX690 card and I’m getting the following error:
Unknown error (Details: Function “_rtBufferCreateFromGLBO” caught exception: Only single CUDA device for GL context supported)
So far i tried using rtContextSetDevices function to use only single GPU and running an application via right click → Render OpenGL on → Geforce GTX 690 (1 of 2) but without success.
I believe there must be a way of simply creating a context that would use only single core or a way to make _rtBufferCreateFromGLBO work in multi-GPU environment… can anyone help me out with this?
OpenGL interoperability on the output buffer displayed with an OpenGL texture blit is not supported in multi-GPU device configurations of the OptiX context because the output buffer doesn’t reside in GPU VRAM in that case.
The optixDenoiser example, like most other OptiX SDK examples, allows to disable OpenGL interop on that buffer with the command line option:
" -n | --nopbo Disable GL interop for display buffer.\n"
you find in the optixDenoiser.cpp or when starting the application with
" -h | --help Print this usage message and exit.\n"
Please try if that works.
Another method to limit CUDA itself to see only the device you want it to run on is the environment variable CUDA_VISIBLE_DEVICES.
In your case set it to 0 or 1 to select only one of your two GPUs on the GTX 690 for CUDA.
I’m not exactly sure what happens for the case where you only selected a single device. That board has two Kepler GK104 chips which are supported by OptiX 5.1.0. I would have expected that to work at least when picking the primary device.
What is your OS version and display driver version?
Thank you for the quick response!
Using --nopbo option solved the issue.
OT: I’m interested in where the output buffer resides in multi-GPUs? Is there a dedicated buffer shared by all GPUs for this? If you could point me to some article or documentation it would be great.
Yes, output and input_output buffers reside in pinned memory on the host for multi-GPU OptiX contexts and all GPUs access it directly via PCI-E, so there is some congestion when doing that with many GPUs at once. Two scale nicely though.
To avoid most of that congestion there is an RT_BUFFER_GPU_LOCAL flag which can only be used with input_output buffers and these are then per GPU but cannot be read by the host, so these are perfect for local accumulations on multi-GPU and then finally writing the result into a pinned memory output buffer.
OptiX Programming Guide about that:
Performance guidelines touching on multi-GPU: [url]http://raytracing-docs.nvidia.com/optix/guide/index.html#performance#13001[/url]
Setting multiple devices: [url]http://raytracing-docs.nvidia.com/optix/guide/index.html#host#3002[/url]
Zero-copy (pinned) memory when using multi-GPU: [url]http://raytracing-docs.nvidia.com/optix/guide/index.html#cuda#9024[/url]
Forum posts explaining that in more detail (including corrections to my own statements because I thought the multi-GPU load balancer was not static):
[url]https://devtalk.nvidia.com/default/topic/1030457/?comment=5242078[/url] and follow the links in there as well.