Hi,
I noticed a strange change in behavior for running my multi-GPU MPI+OpenACC code. I think the change occurred in the last few driver updates (or maybe CUDA update?) (I am using Ubuntu 20.04).
Basically, when I run my code on 4 GPUs with “mpiexec -np 4”, I would usually only see 4 processes in the list shown with nvidia-smi.
Now when I do it, I see 16 processes listed, and for each GPU, three of the processes have 0 memory/activity and correspond to the process with activity on another GPU.
Are these communication processes?
Why was this not shown before?
When I run htop, I only see 4 CPU processes as before.
nvidia-smi output:
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3731 C ./mas 649MiB |
| 0 N/A N/A 3732 C ./mas 0MiB |
| 0 N/A N/A 3733 C ./mas 0MiB |
| 0 N/A N/A 3734 C ./mas 0MiB |
| 1 N/A N/A 3731 C ./mas 0MiB |
| 1 N/A N/A 3732 C ./mas 645MiB |
| 1 N/A N/A 3733 C ./mas 0MiB |
| 1 N/A N/A 3734 C ./mas 0MiB |
| 2 N/A N/A 3731 C ./mas 0MiB |
| 2 N/A N/A 3732 C ./mas 0MiB |
| 2 N/A N/A 3733 C ./mas 645MiB |
| 2 N/A N/A 3734 C ./mas 0MiB |
| 3 N/A N/A 3731 C ./mas 0MiB |
| 3 N/A N/A 3732 C ./mas 0MiB |
| 3 N/A N/A 3733 C ./mas 0MiB |
| 3 N/A N/A 3734 C ./mas 645MiB |
±----------------------------------------------------------------------------+
- Ron