Hi all –
I am experimenting with MIG on a quad-A100 system, and have worked through most of the material in the user guide. As root, I can turn on MIG mode, create a GPU instance, and create a Compute Instance, and see them all with the appropriate arguments to nvidia-smi
.
I am interested in the “bare metal” use-case for non-privileged users.
Where I want to go next is, I want non-privileged users to be able to use the CIs to run things under CUDA. I had imagined that the way this would work is, users could see available CIs in the output of nvidia-smi -L
, and I’d set up some kind of book-keeping mechanism to ensure CIs are not oversubscribed, users would set CUDA_VISIBLE_DEVICES
appropriately, and run their tasks.
But it seems that after setting up the CI, it’s not visible to non-privileged users? nvidia-smi -L
as a regular user just shows the cards, and no MIG units, and nvidia-smi
with no arguments shows MIG enabled on the first card (the only one I did it on), and has the table for MIG devices, but it’s empty.
The docs say that the relevant permissions are for the /proc/driver/nvidia/capabilties/mig/config
and /proc/driver/nvidia/capabilities/mig/monitor
. These are both readable by the regular user.
What am I missing? Is the bare-metal use-case only for the root user? Is there somewhere else where I need to set the permissions?
Thanks in advance.