Hello,
Since the 13.x compiler (we use currently 13.6), we have severe problems with executing OpenACC programs on our nodes with two GPUs (We have two NVIDIA Quadro 6000 (Fermi) GPUs in each node). The problem is that any arbitrary OpenACC program takes BOTH GPUs (instead of only one). If we start e.g. a Jacobi solver on one GPU (without setting any device number), it runs on device 0 AND device 1. The program does neither elaborate on this nor prints output twice. But, you can still see both executions with “nvidia-smi” (see below).
>$ nvidia-smi
Fri Jul 26 10:42:01 2013
+------------------------------------------------------+
| NVIDIA-SMI 4.310.40 Driver Version: 310.40 |
|-------------------------------+----------------------+----------------------+
| GPU Name | Bus-Id Disp. | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro 6000 | 0000:02:00.0 Off | 0 |
| 30% 78C P0 N/A / N/A | 6% 324MB / 5375MB | 79% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Quadro 6000 | 0000:85:00.0 On | 0 |
| 30% 74C P0 N/A / N/A | 2% 98MB / 5375MB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 29538 ./laplace_openacc 362MB |
| 1 29538 ./laplace_openacc 362MB |
+-----------------------------------------------------------------------------+
Then the problem is that we have set our GPUs on the compute mode “exclusive process”. This prohibts any other user to start a GPU program if one OpenACC program (running on both) is executed which is really bad for us.
We have the same problem with MPI programs from a single user. If we have an MPI program with two processes running on one node and each process should actually talk to one GPU (according to the rank number), it does not work: While initializing the first device (acc_init) it takes both GPUs so that the second process get an error and finishes with context error.
Do have any ideas how to get a workaround? Will this be fixed in the next compiler releases (it was not an issue with 12.9 for example)?
Thanks, Sandra