problem with multi gpu using mpi

LSCH · December 1, 2015, 6:49am

Hi

I am trying to implement multi-GPU support for my program using MPI.
However, on my host with two GPU cards I see the following output of nvidia-smi that seems to indicate that one process accessing both cards instead of each process accessing only one card.

Mon Nov 30 14:05:55 2015
+------------------------------------------------------+
| NVIDIA-SMI 340.29     Driver Version: 340.29         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20Xm         Off  | 0000:0A:00.0     Off |                    0 |
| N/A   37C    P0    73W / 235W |     98MiB /  5759MiB |     32%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20Xm         Off  | 0000:0D:00.0     Off |                    0 |
| N/A   35C    P0    75W / 235W |    169MiB /  5759MiB |     52%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0      1431  ../../../../bin/mytest                                  81MiB |
|    1      1431  ../../../../bin/mytest                                  69MiB |
|    1      1430  ../../../../bin/mytest                                  81MiB |
+-----------------------------------------------------------------------------+

However, when I look at the output of the following code, the assigning of the different cards seems to be correct, i.e one process gets device_num 0 and the other gets device_num 1.

    int acc_dev_id = -1;
    acc_device_t dev_type = acc_get_device_type();
    int num_devs = acc_get_num_devices(dev_type);
    MPI::COMM_WORLD.Gather(&num_devs,1,MPI::INT,recv_num_devs,1,MPI::INT,0);
/*
some code to read out recv_num_devs and fill array proc_accelerator_id with the proper device num for each rank taking into account that processes could be on the same host or on different nodes each with one or more gpus attached
*/

MPI::COMM_WORLD.Scatter(&proc_accelerator_id[0],1,MPI::INT,&acc_dev_id,1,MPI::INT,0);
acc_set_device_num(acc_dev_id,dev_type);
std::cout << "OpenACC device type for process " << pid << ": " 
    << dev_type << "\n";
acc_init(dev_type);
 std::cout << "OpenACC device number that will be used for process "
        << pid << ": " << acc_get_device_num(dev_type) << "\n";

So I am wondering what I am doing wrong here. Any suggestions where to look at?

Thanks,
LS

MatColgrove · December 1, 2015, 10:56pm

Hi LS,

In order to maintain interoperability with CUDA, we need to check if a CUDA context has already been created. If so, then we attach to this context. The problem being that starting in CUDA 7.0, there’s a default context. This causes the side-effect of the OpenACC runtime attaching to it, thus showing up as this extra context on the default device.

It’s relatively benign and I doubt it’s causing you any issues. However, I went ahead and opened a problem report (TPR#22133) since it shouldn’t be occurring and does take up a bit of space.

Best regards,
Mat

LSCH · December 2, 2015, 9:28am

Hi Mat

thanks for the background. The reason I wanted to clarify this issue was that I got wrong results for my multi-gpu program. One thought was that something could go wrong if a single process is computing things on two cards while it was only intended to interact with one. I wanted to exclude this error source before digging further into the code to find out what else could have gone wrong.
In this particular case I subdivide my domain into two subdomains and assign a different device to be used for each of them. Looking at the results it seems that the computation has only worked out for one subdomain, i.e. one process, while the other contains only zeros.

Thanks,
LS

Topic		Replies	Views
MPI Multi-GPU process list in nvidia-smi nvc, nvc++ and nvfortran	9	1965	September 10, 2021
Multi-GPU MPI launch failing when UVM enabled Legacy PGI Compilers	5	3777	January 2, 2019
Using multiple GPUs Legacy PGI Compilers	7	22081	August 11, 2009
OpenACC program takes two GPUs (instead of one) Legacy PGI Compilers	3	4937	September 10, 2014
Problem while running parallel cuda process in AMBER CUDA Setup and Installation	5	2732	January 31, 2016
CUDA+MPI = Unexplained Issues... Random Crashes, Errenous Output?!? CUDA Programming and Performance	5	3256	July 7, 2008
about multi GPU control CUDA Programming and Performance	3	711	December 23, 2019
MPI mixing host and gpu devices with PGI accelerator Legacy PGI Compilers	5	3935	December 7, 2011
MPS Server is working with a single node multi-GPU but not working with two nodes multi-GPU CUDA Programming and Performance	0	625	March 28, 2024
MPI running issue using NVIDIA MPS Service on Multi-GPU nodes CUDA Programming and Performance	4	2189	September 16, 2016

problem with multi gpu using mpi

Related topics