Race condition with glxgears running in docker container

I’m trying to run OpenGL and Cuda applications in a docker container. Sometimes glxgears works, but sometimes it’s just a black window, there seems to be some kind of race condition.

Host system:
Ubuntu 22.04, kernal version 6.8.0-58-generic
Nvidia driver version 570.133.20
Cuda version 12.8
GPU: Nvidia RTX 2000 Ada Generation
Nvidia-container-toolkit version: 1.17.6
Docker engine version 28.1.1
nvidia-smi output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 2000 Ada Gene...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   38C    P3            590W /   35W |      15MiB /   8188MiB |     17%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2301      G   /usr/lib/xorg/Xorg                        4MiB |
+-----------------------------------------------------------------------------------------+

The docker container is based from nvidia/cuda:12.8.1-runtime-ubuntu20.04
Here are the relevant sections of the dockerfile:

ENV LD_LIBRARY_PATH="/usr/lib/nvidia:/usr/lib/x86_64-linux-gnu:${LD_LIBRARY_PATH}"

# Install OpenGL/Vulkan libraries
RUN apt-get update && \
  DEBIAN_FRONTEND=noninteractive apt-get install -y \
  libglvnd0 \
  libgl1 \
  libglx0 \
  libegl1 \
  libgles2 \
  libxext6 \
  libx11-6 \
  vulkan-tools \
  libvulkan1 \
  mesa-vulkan-drivers \
  && rm -rf /var/lib/apt/lists/*

# NVIDIA OpenGL libraries matching host driver v570
RUN apt-get update && \
  apt-get install -y --no-install-recommends \
  libnvidia-gl-570 \
  && rm -rf /var/lib/apt/lists/*

# Upgrade OS
RUN apt-get update -q && \
  DEBIAN_FRONTEND=noninteractive apt-get upgrade -y && \
  apt-get autoclean && \
  apt-get autoremove && \
  rm -rf /var/lib/apt/lists/*

# Install Ubuntu Mate desktop
RUN apt-get update -q && \
  DEBIAN_FRONTEND=noninteractive apt-get install -y \
      ubuntu-mate-desktop && \
  apt-get autoclean && \
  apt-get autoremove && \
  rm -rf /var/lib/apt/lists/*

and the following compose options:

privileged: true
    build: .
    runtime: nvidia
    environment:
      - DISPLAY
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=all
      - QT_X11_NO_MITSHM=1
    devices:
      - "/dev/dri:/dev/dri"
      - "/dev/nvidia0:/dev/nvidia0"
      - "/dev/nvidiactl:/dev/nvidiactl"
    volumes:
      - .:/dynamic-nav
      - /tmp/.X11-unix:/tmp/.X11-unix:rw
      - $HOME/.Xauthority:/root/.Xauthority
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

I’m also prepending glxgears with the following options:

__NV_PRIME_RENDER_OFFLOAD=1
__GLX_VENDOR_LIBRARY_NAME=nvidia
__VK_LAYER_NV_optimus=NVIDIA_only
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json

I have no idea what could be going wrong, I’ve been trying to debug this for days.

One interesting thing to note is that when glxgears runs successfully, the FPS is ~500, but when there’s just a black window, the FPS is ~37000. But both cases have the following properties:

GL_RENDERER   = NVIDIA RTX 2000 Ada Generation Laptop GPU/PCIe/SSE2
GL_VERSION    = 4.6.0 NVIDIA 570.133.20
GL_VENDOR     = NVIDIA Corporation

I discovered that the cause of my problem was some conflict with the Ubuntu Mate desktop environment I was running in my docker container. After switching to a headless container with X11 forwarding, things are working perfectly.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.