Unstable performance in docker environment (X Error of failed request: BadWindow)

Hi,
We are running gazebo simulation in docker environment. When we run it in integration test environment (headless mode) it is unreliable. When it fails one of those two errors is visible in gzserver output:

X Error of failed request:  BadAlloc (insufficient resources for operation)
2023-04-23T23:17:02.915986045Z   Major opcode of failed request:  150 (GLX)
2023-04-23T23:17:02.915987750Z   Minor opcode of failed request:  5 (X_GLXMakeCurrent)
2023-04-23T23:17:02.915988844Z   Serial number of failed request:  0
2023-04-23T23:17:02.915989907Z   Current serial number in output stream:  27

or

X Error of failed request:  BadWindow (invalid Window parameter)
2023-04-23T23:24:39.314267598Z   Major opcode of failed request:  150 (GLX)
2023-04-23T23:24:39.314269154Z   Minor opcode of failed request:  16 (X_GLXVendorPrivate)
2023-04-23T23:24:39.314270396Z   Resource id in failed request:  0x400002
2023-04-23T23:24:39.314271665Z   Serial number of failed request:  0
2023-04-23T23:24:39.314272827Z   Current serial number in output stream:  35

Usually 2-3 tries will reproduce the problem. In headless mode we use gzserver and Xvfb to create fake display for it.

Our host is set to use nvidia as primary GPU.
Our docker image is based on nvidia/opengl:1.2-glvnd-runtime-ubuntu20.04

Launching X server

    export DISPLAY=:99

    echo "[Dbg] starting Xvfb..."
    Xvfb :99 -screen 0 1024x768x24 &
    echo "[Dbg] starting Xvfb...DONE"

    echo "[Dbg] waiting for xdpyinfo..."
    until xdpyinfo &>/dev/null; do :; done
    echo "[Dbg] waiting for xdpyinfo...DONE"
...
...
    echo "en    vironment in gz_start"
    env| sort
    echo "[Dbg] GLX Info..."
    glxinfo | grep OpenGL
    echo "[Dbg] GLX Info...DONE"
  
    echo "[Dbg] starting gzserver..."
    gzserver ${gazebo_world} --verbose
    echo "[Dbg] starting gzserver...DONE"

Here is example log:

[Dbg] starting Xvfb...
2023-04-24T03:13:16.783162545Z [Dbg] starting Xvfb...DONE
2023-04-24T03:13:16.783164627Z [Dbg] waiting for xdpyinfo...
2023-04-24T03:13:16.808443030Z [Dbg] waiting for xdpyinfo...DONE
2023-04-24T03:13:16.870715648Z environment in gz_start
2023-04-24T03:13:16.871118660Z DISPLAY=:99
2023-04-24T03:13:16.871119792Z DRI_PRIME=1
2023-04-24T03:13:16.871129148Z HOME=/tmp
2023-04-24T03:13:16.871139449Z IGN_IP=127.0.0.1
2023-04-24T03:13:16.871140523Z IGN_PARTITION=sim
2023-04-24T03:13:16.871155975Z NVIDIA_DRIVER_CAPABILITIES=graphics,compat32,utility
2023-04-24T03:13:16.871157145Z NVIDIA_VISIBLE_DEVICES=all
2023-04-24T03:13:16.871158276Z OCL_ICD_FILENAMES=libintelocl_emu.so:libalteracl.so:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/x64/libintelocl.so
2023-04-24T03:13:16.871159495Z ONEAPI_ROOT=/opt/intel/oneapi
2023-04-24T03:13:16.871163132Z PWD=/
2023-04-24T03:13:16.871192433Z TBBROOT=/opt/intel/oneapi/tbb/2021.8.0/env/..
2023-04-24T03:13:16.871193556Z TERM=xterm
2023-04-24T03:13:16.871198024Z __GLX_VENDOR_LIBRARY_NAME=nvidia
2023-04-24T03:13:16.871199142Z __NV_PRIME_RENDER_OFFLOAD=1
2023-04-24T03:13:16.871200249Z __VK_LAYER_NV_optimus=NVIDIA_only

2023-04-24T03:13:16.999281656Z [Dbg] GLX Info...
2023-04-24T03:13:17.032287627Z OpenGL vendor string: NVIDIA Corporation
2023-04-24T03:13:17.032301288Z OpenGL renderer string: NVIDIA RTX A3000 12GB Laptop GPU/PCIe/SSE2
2023-04-24T03:13:17.032303061Z OpenGL core profile version string: 4.6.0 NVIDIA 525.105.17
2023-04-24T03:13:17.032304460Z OpenGL core profile shading language version string: 4.60 NVIDIA
2023-04-24T03:13:17.032305687Z OpenGL core profile context flags: (none)
2023-04-24T03:13:17.032306859Z OpenGL core profile profile mask: core profile
2023-04-24T03:13:17.032308047Z OpenGL core profile extensions:
2023-04-24T03:13:17.033793989Z OpenGL version string: 4.6.0 NVIDIA 525.105.17
2023-04-24T03:13:17.033798654Z OpenGL shading language version string: 4.60 NVIDIA
2023-04-24T03:13:17.033799902Z OpenGL context flags: (none)
2023-04-24T03:13:17.033801057Z OpenGL profile mask: (none)
2023-04-24T03:13:17.033802136Z OpenGL extensions:
2023-04-24T03:13:17.034878706Z OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 525.105.17
2023-04-24T03:13:17.034907253Z OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
2023-04-24T03:13:17.034908752Z OpenGL ES profile extensions:
2023-04-24T03:13:17.067998542Z [Dbg] GLX Info...DONE

2023-04-24T03:13:17.068016355Z [Dbg] starting gzserver...
2023-04-24T03:13:17.327840901Z Gazebo multi-robot simulator, version 11.12.0
2023-04-24T03:13:17.327869476Z Copyright (C) 2012 Open Source Robotics Foundation.
2023-04-24T03:13:17.327871018Z Released under the Apache 2 License.
2023-04-24T03:13:17.327872232Z http://gazebosim.org
2023-04-24T03:13:17.327873430Z 
2023-04-24T03:13:17.328634825Z [Msg] Waiting for master.
2023-04-24T03:13:17.338930831Z [Msg] Connected to gazebo master @ http://127.0.0.1:11345
2023-04-24T03:13:17.338942575Z [Msg] Publicized address: 192.168.128.44
2023-04-24T03:13:17.372171938Z X Error of failed request:  BadWindow (invalid Window parameter)
2023-04-24T03:13:17.372182691Z   Major opcode of failed request:  150 (GLX)
2023-04-24T03:13:17.372184188Z   Minor opcode of failed request:  16 (X_GLXVendorPrivate)
2023-04-24T03:13:17.372185361Z   Resource id in failed request:  0x400002
2023-04-24T03:13:17.372186392Z   Serial number of failed request:  0
2023-04-24T03:13:17.372187424Z   Current serial number in output stream:  35

Last entry in ~/.gazebo/ogre.log indicates that it tries to create rendering window.

13:38:40: GLRenderSystem::_createRenderWindow "OgreWindow(0)", 1x1 windowed  miscParams: FSAA=4 border=none contentScalingFactor=1.000000 macAPI=cocoa macAPICocoaUseNSView=true parentWindowHandle=2097153 stereoMode=Frame Sequential 

Occasionally I will get X Error of failed request: BadWindow from

So far I tried:

  • Running on Ubuntu 20.04 with gazebo instead of gzserver and DISPLAY pointing to host Xserver it works every time.
  • switch to intel gpu/libraries - works without problems which make me suspect nvidia driver/opengl implementation.
  • switched to nvidia-driver-530 with same results
  • our container is re-created on every launch but just restarting container some times it works some times it fails
  • tried glTrace but no output shows up before the failure, on successful run I can see glTrace output.

Any suggestion would be welcome.

xvfb is a simple software xserver which you’re trying to use prime with, this won’t work, it doesn’t support it. You might be able to use virtualgl instead.

Can you elaborate on “this won’t work”. It is existing system and it is working. At least some times :)
Usually we have the prime switched to nvidio only…

Unless you mean the instability.
When it works I can see tasks executed on nvidia card using nvidia-smi