Unwanted duplicate threads/processes on dual P6000

firstonehere · September 24, 2020, 2:18am

On a workstation with
Centos 8, 4.18.0-193.19.1.el8_2.x86_64, driver 455.23.05,
We are experiencing memory-related exceptions on compute jobs that were completing successfully on same hardware prior to migration from ubuntu. We noticed that jobs spawned twice the number of threads requested, and wonder whether the display-related duplication of processes over two GPUs (see nvidia-smi snippet below) is a symptom of the same problem.
Creating a new xorg.conf (after deleting old) with the --busid=PCI:... nvidia-xconfig modifier fails to prevent duplicate Xorg threads. Could this duplication be due to some hardware or bios configuration?

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     17689      G   /usr/libexec/Xorg                  58MiB |
|    0   N/A  N/A     17834      G   /usr/bin/gnome-shell               16MiB |
|    1   N/A  N/A     17689      G   /usr/libexec/Xorg                  58MiB |
|    1   N/A  N/A     17834      G   /usr/bin/gnome-shell               16MiB |
+-----------------------------------------------------------------------------+

Guillawme · October 2, 2020, 4:40pm

I had the same problem with the same CentOS version and nvidia driver version as you describe, on two quad-GPU workstations at work (one quad-GTX 1070 Ti and one quad-RTX 2080 Ti). Downgrading to driver version 440.33.01 (the default one with CUDA 10.2) fixed it on both workstations. I don’t know why the latest driver did that, but I will keep this slightly older driver version because I need a readable output of nvidia-smi (I use it a lot to monitor the initial iterations of long-running GPU compute jobs; not convenient if it is cluttered like that).

manoskav · January 3, 2022, 12:19pm

I have the same problem with Almalinux 8.5 (RHEL based). I have two A6000 GPUs. I have installed different nvidia-drivers (460+) and always I have duplicate processes on both GPUs.
I don’t know if this is expected behaviour, but I suspect that I have access only to the half GPU memory (ie. only in one card). Always both GPUs have the same memory allocated.

In the python code only the second GPU is being utilized, but as you can see memory is being allocated on the first one also. As a consequence, if I want to run two processes, each one on different GPU, the memory is consumed fast because the duplicate process allocates memory on the other GPU.

generix · January 3, 2022, 1:37pm

Please check if this helps:
https://forums.developer.nvidia.com/t/memory-is-allocated-on-all-gpus/183110/2?u=generix

manoskav · January 3, 2022, 2:25pm

You saved me one more day of googling and experimenting with different drivers, before proceeding to install Ubuntu.
I disabled BaseMosaic in “/etc/X11/xorg.conf.d/10-nvidia.conf” and everything works as expected.

Thank you so much!! :)

Topic		Replies	Views
Host->Device memcpy failure in forked process valgrind output included CUDA Programming and Performance	4	7129	February 26, 2008
GPU failing after Ubuntu 20.04 install Linux	8	586	March 31, 2021
MPI Multi-GPU process list in nvidia-smi nvc, nvc++ and nvfortran	9	1997	September 10, 2021
How to remove /usr/bin/gnome-shell in nvidia-smi? CUDA Setup and Installation ubuntu	3	4587	January 2, 2025
Dual GPU only showing mouse cursor on second display Linux ubuntu	5	909	January 5, 2021
How to prevent Xorg process from using the GPU? on Ubuntu 20.04.3 LTS (with a RTX 3050 Ti) Linux ubuntu	5	17166	February 3, 2022
Data being sent to both GPUs despite only selecting one CUDA Programming and Performance	17	451	March 25, 2024
Running out of video memory when running simultaneous OpenGL processes Linux	5	2186	October 14, 2013
Keeping Xorg's hands off a GPU (redeux) Linux boot , cuda	2	3353	July 11, 2022
Two GPUs, but 2nd GPU not detected. How to fix? CUDA Setup and Installation	10	15620	January 21, 2018

Unwanted duplicate threads/processes on dual P6000

Related topics