Understanding of how OpenGL libraries are linked to an application within a container using the GPU Operator

Hello,

I use the GPU operator to deploy OpenGL applications in a Kubernetes Cluster. It works fine, my pod (a basic glxgears) starts and when running nvidia-smi from the nvidia-driver pod I can see that glxgears is using the GPU (and the metrics are consistent with the usage of hardware acceleration).

However, when looking within the container some things are unclear.
When I inspect the content of the container on my PC, ie without the GPU Operator, I can see that the installation of glx-utils package comes with some default openGL libraries (with mesa implementation) :

[root@9417bac3962a lib64]# ls -al /usr/lib64/ |grep libGL
lrwxrwxrwx  1 root root      14 Nov 11  2022 libGL.so.1 -> libGL.so.1.7.0
-rwxr-xr-x  1 root root  558944 Nov 11  2022 libGL.so.1.7.0
lrwxrwxrwx  1 root root      15 Nov 11  2022 libGLX.so.0 -> libGLX.so.0.0.0
-rwxr-xr-x  1 root root  141256 Nov 11  2022 libGLX.so.0.0.0
lrwxrwxrwx  1 root root      20 Nov 11  2022 libGLX_mesa.so.0 -> libGLX_mesa.so.0.0.0
-rwxr-xr-x  1 root root  502032 Nov 11  2022 libGLX_mesa.so.0.0.0
lrwxrwxrwx  1 root root      27 Nov 11  2022 libGLX_system.so.0 -> /usr/lib64/libGLX_mesa.so.0
lrwxrwxrwx  1 root root      22 Nov 11  2022 libGLdispatch.so.0 -> libGLdispatch.so.0.0.0
-rwxr-xr-x  1 root root  769048 Nov 11  2022 libGLdispatch.so.0.0.0

When running the same command within the container deployed in my cluster with the GPU Operator i have the following result :

[root@glxgears-glxgears-deployment-694bc49445-87kzr lib64]# ls -al /usr/lib64/ | grep libGL
lrwxrwxrwx.  1 root root       14 Nov 11  2022 libGL.so.1 -> libGL.so.1.7.0
-rwxr-xr-x.  1 root root   558944 Nov 11  2022 libGL.so.1.7.0
lrwxrwxrwx.  1 root root       33 Sep  9 15:40 libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.550.107.02
-rwxr-xr-x.  1 root root    68000 Sep  6 13:36 libGLESv1_CM_nvidia.so.550.107.02
lrwxrwxrwx.  1 root root       30 Sep  9 15:40 libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.550.107.02
-rwxr-xr-x.  1 root root   117144 Sep  6 13:36 libGLESv2_nvidia.so.550.107.02
lrwxrwxrwx.  1 root root       15 Nov 11  2022 libGLX.so.0 -> libGLX.so.0.0.0
-rwxr-xr-x.  1 root root   141256 Nov 11  2022 libGLX.so.0.0.0
lrwxrwxrwx.  1 root root       27 Sep  9 15:40 libGLX_indirect.so.0 -> libGLX_nvidia.so.550.107.02
lrwxrwxrwx.  1 root root       20 Nov 11  2022 libGLX_mesa.so.0 -> libGLX_mesa.so.0.0.0
-rwxr-xr-x.  1 root root   502032 Nov 11  2022 libGLX_mesa.so.0.0.0
lrwxrwxrwx.  1 root root       27 Sep  9 15:40 libGLX_nvidia.so.0 -> libGLX_nvidia.so.550.107.02
-rwxr-xr-x.  1 root root  1203776 Sep  6 13:36 libGLX_nvidia.so.550.107.02
lrwxrwxrwx.  1 root root       27 Nov 11  2022 libGLX_system.so.0 -> /usr/lib64/libGLX_mesa.so.0
lrwxrwxrwx.  1 root root       22 Nov 11  2022 libGLdispatch.so.0 -> libGLdispatch.so.0.0.0
-rwxr-xr-x.  1 root root   769048 Nov 11  2022 libGLdispatch.so.0.0.0

First observations :

  • some new libraries are present (I guess mounted by the nvidia-container-toolkit)
  • already present libraries are unchanged (size is identical)

If now I have a look to libraries present in the same container but in the volume shared with the host where libraries are installed :

[root@glxgears-glxgears-deployment-694bc49445-87kzr lib64]# ls -al /run/driver/lib/x86_64-linux-gnu/  | grep libGL
lrwxrwxrwx. 1 root root       10 Sep  6 13:36 libGL.so -> libGL.so.1
lrwxrwxrwx. 1 root root       14 Sep  6 13:36 libGL.so.1 -> libGL.so.1.7.0
-rwxr-xr-x. 1 root root   649416 Sep  6 13:36 libGL.so.1.7.0
lrwxrwxrwx. 1 root root       17 Sep  6 13:36 libGLESv1_CM.so -> libGLESv1_CM.so.1
lrwxrwxrwx. 1 root root       21 Sep  6 13:36 libGLESv1_CM.so.1 -> libGLESv1_CM.so.1.2.0
-rwxr-xr-x. 1 root root    43208 Sep  6 13:36 libGLESv1_CM.so.1.2.0
lrwxrwxrwx. 1 root root       33 Sep  6 13:36 libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.550.107.02
-rwxr-xr-x. 1 root root    68000 Sep  6 13:36 libGLESv1_CM_nvidia.so.550.107.02
lrwxrwxrwx. 1 root root       14 Sep  6 13:36 libGLESv2.so -> libGLESv2.so.2
lrwxrwxrwx. 1 root root       18 Sep  6 13:36 libGLESv2.so.2 -> libGLESv2.so.2.1.0
-rwxr-xr-x. 1 root root    80064 Sep  6 13:36 libGLESv2.so.2.1.0
lrwxrwxrwx. 1 root root       30 Sep  6 13:36 libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.550.107.02
-rwxr-xr-x. 1 root root   117144 Sep  6 13:36 libGLESv2_nvidia.so.550.107.02
lrwxrwxrwx. 1 root root       11 Sep  6 13:36 libGLX.so -> libGLX.so.0
-rwxr-xr-x. 1 root root   137616 Sep  6 13:36 libGLX.so.0
lrwxrwxrwx. 1 root root       27 Sep  6 13:36 libGLX_nvidia.so.0 -> libGLX_nvidia.so.550.107.02
-rwxr-xr-x. 1 root root  1203776 Sep  6 13:36 libGLX_nvidia.so.550.107.02
-rwxr-xr-x. 1 root root   952576 Sep  6 13:36 libGLdispatch.so.0

We can notice that for the following libraries :

  • libGL
  • libGLX
  • libGLdispatch

The version present in /usr/lib64 is the one initially installed in the container and not the one mounted by the nvidia stack.

When looking to an extract of the links of the application :

[root@glxgears-glxgears-deployment-694bc49445-87kzr lib64]# ldd /usr/bin/glxgears
        libGL.so.1 => /usr/lib64/libGL.so.1 (0x00007fa1142e7000)
        libX11.so.6 => /usr/lib64/libX11.so.6 (0x00007fa113c22000)
        libGLX.so.0 => /usr/lib64/libGLX.so.0 (0x00007fa11362b000)
        libGLdispatch.so.0 => /usr/lib64/libGLdispatch.so.0 (0x00007fa113162000)

We can also see that the application is not linked with the libraries built with my version of the driver.

The configuration of my X-Server is pretty the same as it is started in another pod and I noticed exactly the same thing.

So my question is pretty basic : how can this work ? It seems that my application loads the mesa version of the libGL, libGLX & libGLdispatch, however the display is well rendered by the GPU. Am I missing something ? It would be great if I can find deep documentation of these mechanisms.

If necessary, I’m using the following versions :

GPU Operator : 23.6.1
Container toolkit : 1.13.4-ubuntu20.04

Thanks!

Regards,