I’m trying to test CUDA managed memory (i.e., unified virtual memory – UVM) with OpenACC in a multi-GPU environment. The code is Fortran90 with MPI. I’m using MPI to launch 1 process per GPU and assigning each MPI rank to a unique device [0-3]. One GPU/node works fine. However, when I go to 2 GPUs/node, I occasionally get the following error:
call to cuCtxCreate returned error 101: Invalid device
from one of the MPI ranks and the job terminates. I can repeat the launch and the job will often run after a few tries. That is, the error is intermittent.
When I increase to 4 GPUs/node, the failure rate increases significantly and I can rarely get this to run successfully. When the jobs do run, the solutions are correct.
The GPU device id’s requested when I call
are within the range of GPU id’s returned by
acc_get_num_devices( acc_device_nvidia )
pgaccelinfo reports 4 GPUs in ‘exclusive-process’ compute mode.
When I disable managed memory and explicitly control the OpenACC device data regions I do not have this problem and can run with 4 GPUs/node w/o issue.
I’m using 18.10 but I see the same behavior with 18.7. I’m using the OpenMPI distribution that comes with the PGI release. This is on a Power8 system with 4 P100’s / node running RHEL.
I’ve seen this error reported when trying to launch multiple MPI jobs per device and the solution is to enable MPS. I’m not launching multiple MPI processes per device in this scenario so not sure if this applies. But, I did try to start the MPS daemon with
nvidia-cuda-mps-control -d as a normal user but the MPI job failed when cuInit was called. (I have no root access and no access to the sys logs.) All MPI processes gave the same error:
call to cuInit returned error 999: Unknown
Am I missing something with the job launch configuration? Any help would be greatly appreciated.
Thanks in advance.