I am executing my code on an 8 gpu node with MPS on. I am trying to overload the GPUs by running 21 processes through MPI in this fashion:
mpirun -np 21 ./a.out
This run results in the following error:
call to cuDevicePrimaryCtxRetain returned error 101: Invalid device
When I run this on a machine with only a single gpu, no issues occur and it executes (inefficiently) through MPS correctly.
I am certain that it has to do with how I am calling ACC_INIT
ACC_NUM = ACC_GET_NUM_DEVICES(ACC_DEVICE_NVIDIA)
GPUNUM = MOD(MYID,ACC_NUM)
ACC_DEV = ACC_GET_DEVICE_NUM(ACC_DEVICE_NVIDIA)
Any help would be appreciated.