On a workstation with
Centos 8, 4.18.0-193.19.1.el8_2.x86_64, driver 455.23.05,
We are experiencing memory-related exceptions on compute jobs that were completing successfully on same hardware prior to migration from ubuntu. We noticed that jobs spawned twice the number of threads requested, and wonder whether the display-related duplication of processes over two GPUs (see nvidia-smi snippet below) is a symptom of the same problem.
Creating a new xorg.conf (after deleting old) with the --busid=PCI:...
nvidia-xconfig modifier fails to prevent duplicate Xorg threads. Could this duplication be due to some hardware or bios configuration?
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 17689 G /usr/libexec/Xorg 58MiB |
| 0 N/A N/A 17834 G /usr/bin/gnome-shell 16MiB |
| 1 N/A N/A 17689 G /usr/libexec/Xorg 58MiB |
| 1 N/A N/A 17834 G /usr/bin/gnome-shell 16MiB |
+-----------------------------------------------------------------------------+