Hello, I need some help and hope this is the right place for my question.
Context
We need to run some performance critical tests within our CI pipeline. Therefore we use a special Gitlab CI Runner with an attached Nvidia GPU (Tesla T4). The CI Job runs in Kubernetes on Azure. The setup with NVIDIA Container Toolkit is already done.
Current situation
We use a Cypress docker image as base and set the required environment variables for the container runtime:
FROM cypress/included:8.3.1
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility,display,graphics
The nvidia-smi command shows the following output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000001:00:00.0 Off | Off |
| N/A 25C P8 8W / 70W | 0MiB / 16127MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Problem
Cypress needs an X Server when running on Linux. By default it uses xvfb, but this will not use the GPU. Thats why I try to set up Xorg, which fails with the following error:
Fatal server error: (EE) no screens found(EE)
I created a configuration at /etc/X11/xorg.conf
and tried some options, but had no success so far:
- Option “AllowEmptyInitialConfiguration”
- Option “IgnoreEDID”
- Option “UseDisplayDevice” “none”
The nvidia-xconfig
seems to be deprecated and can no longer be installed. It states: “This tool is deprecated. The NVIDIA drivers now automatically integrate with the Xorg Xserver configuration. Creating an xorg.conf is no longer needed for normal setups.”. But without this config, Xorg will show the above error message as well.
This is the xorg.conf
generated by Xorg :0 -configure
Section "ServerLayout"
Identifier "X.org Configured"
Screen 0 "Screen0" 0 0
InputDevice "Mouse0" "CorePointer"
InputDevice "Keyboard0" "CoreKeyboard"
EndSection
Section "Files"
ModulePath "/usr/lib/xorg/modules"
FontPath "/usr/share/fonts/X11/misc"
FontPath "/usr/share/fonts/X11/cyrillic"
FontPath "/usr/share/fonts/X11/100dpi/:unscaled"
FontPath "/usr/share/fonts/X11/75dpi/:unscaled"
FontPath "/usr/share/fonts/X11/Type1"
FontPath "/usr/share/fonts/X11/100dpi"
FontPath "/usr/share/fonts/X11/75dpi"
FontPath "built-ins"
EndSection
Section "Module"
Load "glx"
EndSection
Section "InputDevice"
Identifier "Keyboard0"
Driver "kbd"
EndSection
Section "InputDevice"
Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/input/mice"
Option "ZAxisMapping" "4 5 6 7"
EndSection
Section "Monitor"
Identifier "Monitor0"
VendorName "Monitor Vendor"
ModelName "Monitor Model"
EndSection
Section "Device"
### Available Driver options are:-
### Values: <i>: integer, <f>: float, <bool>: "True"/"False",
### <string>: "String", <freq>: "<f> Hz/kHz/MHz",
### <percent>: "<f>%"
### [arg]: arg optional
#Option "SWcursor" # [<bool>]
#Option "kmsdev" # <str>
#Option "ShadowFB" # [<bool>]
#Option "AccelMethod" # <str>
#Option "PageFlip" # [<bool>]
#Option "ZaphodHeads" # <str>
#Option "DoubleShadow" # [<bool>]
Identifier "Card0"
Driver "modesetting"
BusID "PCI:0:0:0"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Card0"
Monitor "Monitor0"
SubSection "Display"
Viewport 0 0
Depth 1
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 4
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 8
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 15
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 16
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 24
EndSubSection
EndSection
I am not sure why the driver is set to modesetting
, but changing it to nvidia
results in Failed to load module "nvidia" (module does not exist, 0)
.
Question
Do you have any hints how I can get Xorg (or any other X Server) running within a container while using the Nvidia Tesla T4 GPU, but without a display?