OpenCL/OpenGL interoperability and the NV-GLX X11 extension

DRC42 · August 27, 2019, 9:35pm

I develop VirtualGL, which is described in more detail in this thread: [url]Sharing render buffers or render textures among multiple OpenGL contexts - OpenGL - NVIDIA Developer Forums. Nutshell: it’s a GLX implementation that splits 2D and 3D rendering onto different X displays, so you can run Linux OpenGL applications, with GPU acceleration, even if you are using an X proxy or other remote display solution that doesn’t inherently support GPU-accelerated OpenGL.

A customer (both my customer and nVidia’s) reported a problem having to do with OpenCL/OpenGL interoperability functions. When those functions are used in a VirtualGL environment, an error occurs (Xlib: extension “NV-GLX” missing on display “:0.0”), followed by a segfault. The segfault appears to be within nVidia’s closed-source API/driver stack. I can reproduce the problem using the OpenCL Marching Cubes Isosurfaces example from [url]https://developer.nvidia.com/opencl[/url]. More specifically, in that case, the segfault occurs within the body of clEnqueueReleaseGLObjects(), which appears to be calling glCreateSyncFromCLeventARB() and glWaitSync() in succession. glWaitSync() appears to be the point at which nVidia’s OpenGL implementation tries to access the NV-GLX X11 extension and subsequently crashes.

It might be tempting to say that VirtualGL’s approach is a hack, but it’s worth noting that this form of split rendering is common within the scientific visualization community, and (for instance) ParaView also does it. So I suspect that this issue would not be confined to VirtualGL. It would probably also occur in any situation in which an application uses an nVidia-attached X display for off-screen rendering and OpenCL/OpenGL interop and then tries to display the rendered OpenGL frames to a non-nVidia-attached X display.

NV-GLX is proprietary and thus doesn’t exist on any display that isn’t controlled by the nVidia API/driver stack. However, VirtualGL ensures that all OpenGL contexts are created on the “3D X server” (the GPU-attached X server to which VirtualGL redirects OpenGL rendering.) I have verified that, at the time glWaitSync() is called, there is a current OpenGL context, and it has been successfully established on the 3D X server. So I’m not sure why something within that function is trying to open a 2D X server connection. I tried swizzling the DISPLAY environment variable prior to the call to clEnqueueReleaseGLObjects(), but that didn’t do anything.

I just need to have a sense of what glWaitSync() is doing behind the scenes to attempt to access the X server, and of course I can’t get that sense because nVidia’s implementation is closed-source. Any advice is appreciated.

DRC42 · September 23, 2019, 4:58pm

The problem turned out to be due to the application passing a 2D X server handle to clCreateContext() via the CL_GLX_DISPLAY_KHR property. VirtualGL now interposes clCreateContext() in order to address that problem.