Hi
I am trying to implement multi-GPU support for my program using MPI.
However, on my host with two GPU cards I see the following output of nvidia-smi that seems to indicate that one process accessing both cards instead of each process accessing only one card.
Mon Nov 30 14:05:55 2015
+------------------------------------------------------+
| NVIDIA-SMI 340.29 Driver Version: 340.29 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20Xm Off | 0000:0A:00.0 Off | 0 |
| N/A 37C P0 73W / 235W | 98MiB / 5759MiB | 32% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20Xm Off | 0000:0D:00.0 Off | 0 |
| N/A 35C P0 75W / 235W | 169MiB / 5759MiB | 52% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 1431 ../../../../bin/mytest 81MiB |
| 1 1431 ../../../../bin/mytest 69MiB |
| 1 1430 ../../../../bin/mytest 81MiB |
+-----------------------------------------------------------------------------+
However, when I look at the output of the following code, the assigning of the different cards seems to be correct, i.e one process gets device_num 0 and the other gets device_num 1.
int acc_dev_id = -1;
acc_device_t dev_type = acc_get_device_type();
int num_devs = acc_get_num_devices(dev_type);
MPI::COMM_WORLD.Gather(&num_devs,1,MPI::INT,recv_num_devs,1,MPI::INT,0);
/*
some code to read out recv_num_devs and fill array proc_accelerator_id with the proper device num for each rank taking into account that processes could be on the same host or on different nodes each with one or more gpus attached
*/
MPI::COMM_WORLD.Scatter(&proc_accelerator_id[0],1,MPI::INT,&acc_dev_id,1,MPI::INT,0);
acc_set_device_num(acc_dev_id,dev_type);
std::cout << "OpenACC device type for process " << pid << ": "
<< dev_type << "\n";
acc_init(dev_type);
std::cout << "OpenACC device number that will be used for process "
<< pid << ": " << acc_get_device_num(dev_type) << "\n";
So I am wondering what I am doing wrong here. Any suggestions where to look at?
Thanks,
LS