MPI Multi-GPU process list in nvidia-smi

Hi,

I noticed a strange change in behavior for running my multi-GPU MPI+OpenACC code. I think the change occurred in the last few driver updates (or maybe CUDA update?) (I am using Ubuntu 20.04).

Basically, when I run my code on 4 GPUs with “mpiexec -np 4”, I would usually only see 4 processes in the list shown with nvidia-smi.
Now when I do it, I see 16 processes listed, and for each GPU, three of the processes have 0 memory/activity and correspond to the process with activity on another GPU.
Are these communication processes?
Why was this not shown before?

When I run htop, I only see 4 CPU processes as before.

nvidia-smi output:

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3731 C ./mas 649MiB |
| 0 N/A N/A 3732 C ./mas 0MiB |
| 0 N/A N/A 3733 C ./mas 0MiB |
| 0 N/A N/A 3734 C ./mas 0MiB |
| 1 N/A N/A 3731 C ./mas 0MiB |
| 1 N/A N/A 3732 C ./mas 645MiB |
| 1 N/A N/A 3733 C ./mas 0MiB |
| 1 N/A N/A 3734 C ./mas 0MiB |
| 2 N/A N/A 3731 C ./mas 0MiB |
| 2 N/A N/A 3732 C ./mas 0MiB |
| 2 N/A N/A 3733 C ./mas 645MiB |
| 2 N/A N/A 3734 C ./mas 0MiB |
| 3 N/A N/A 3731 C ./mas 0MiB |
| 3 N/A N/A 3732 C ./mas 0MiB |
| 3 N/A N/A 3733 C ./mas 0MiB |
| 3 N/A N/A 3734 C ./mas 645MiB |
±----------------------------------------------------------------------------+

  • Ron

Hi Ron,

What driver version are you using? Unfortunately I’ve not seen this before so don’t know what’s going on. The fact that the extra processes have no memory is odd, but rules out that they are extra contexts that are being created.

Maybe running the code through Nsight-Systems would show where these are coming from? Does performance seem to be effected?

-Mat

Hi,

The performance does not seem to be affected (whew!).

I am using:
Driver Version: 455.32.00 CUDA Version: 11.1

On:
Ubuntu 20.04 kernel 5.4.0-52-generic
using
NVHPC 20.9 with the OpenMPI 3 it comes with + OpenACC

My topology is:
GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity
GPU0 X NV2 SYS SYS 0-127 N/A
GPU1 NV2 X PHB SYS 0-127 N/A
GPU2 SYS PHB X NV2 0-127 N/A

GPU3 SYS SYS NV2 X 0-127 N/A

  • Ron

Hi Ron,

I was able to reproduce this on a system with a CUDA 11.1 driver. As far as we can tell, this appears to be an extra reporting of the processes running on the other GPU and most likely benign.

-Mat

Hi,

OK good to know.

So this is an issue with nvidia-smi itself?

  • Ron

So this is an issue with nvidia-smi itself?

Sorry, I’m not sure is it’s from nvidia-smi or the driver.

Hi,

Just FYI, it looks like with some recent ubuntu CUDA driver update, this issue has been fixed and nvidia-smi now shows the correct number of processes.

  • Ron

I realise that this seems to be in the nuisance-only category but will it be addressed? At the Seti Institute, almost all of our back-end systems use multiple-device GPU subsystems. This is a distraction while monitoring kernel code running on the selected device.

Thank you and I will inform people about this issue thread.

Sample configuration:

OS                           : Linux-4.15.0-72-generic-x86_64-with-debian-stretch-sid
CuPy Version                 : 8.6.0
NumPy Version                : 1.21.1
SciPy Version                : 1.7.0
Cython Build Version         : 0.29.22
CUDA Root                    : /opt/conda
CUDA Build Version           : 11000
CUDA Driver Version          : 11010
CUDA Runtime Version         : 11000
cuBLAS Version               : 11200
cuFFT Version                : 10201
cuRAND Version               : 10201
cuSOLVER Version             : (10, 6, 0)
cuSPARSE Version             : 11101
NVRTC Version                : (11, 0)
Thrust Version               : 100909
CUB Build Version            : 100909
cuDNN Build Version          : 8100
cuDNN Version                : 8100
NCCL Build Version           : 2804
NCCL Runtime Version         : 20906
cuTENSOR Version             : 10202
Device 0 Name                : TITAN Xp
Device 0 Compute Capability  : 61
Device 1 Name                : TITAN Xp
Device 1 Compute Capability  : 61
Device 2 Name                : GeForce GTX 1080
Device 2 Compute Capability  : 61
Device 3 Name                : GeForce GTX TITAN X
Device 3 Compute Capability  : 52

I believe it has already been addressed in more recent CUDA driver versions. At least I no longer see the issue, and as Ron noted, it went away after he updated his driver.

@MatColgrove Sorry, just saw what I overlooked before. Will work with our admins to update the CUDA driver.