Allow non-root users to change application clocks?

Hello, I’m trying to allow non-root users to set application clocks for our A100 cards. But it appears the -acp option for nvidia-smi is deprecated. Is there an alternative? I see --auto-boost-permission but that’s not supported for this card (and I specifically want users to be able to use -ac, not --auto-boost-default). Driver version is 525.85.12, and this is an A100-40GB card.

~$ sudo nvidia-smi -acp UNRESTRICTED
Warning: This option is deprecated and will be removed in future releases
Treating as warning and moving on.
All done.

~$ sudo nvidia-smi --auto-boost-permission UNRESTRICTED
Changing auto boosted clocks permissions is not supported for GPU: 00000000:41:00.0.
Treating as warning and moving on.
All done.

Thank you,
-Collin

I have no experience with it, but perhaps you can achieve what you want through DCGM:

Thanks, I tried it out but in the context of a Slurm environment, it allows the user to change the clocks on all GPUs on the current node. It doesn’t restrict the user to the GPUs it allocated, like nvidia-smi does. So I think this opens the door to other issues, like people changing clocks on GPUs they’re not running on.

~/$ dcgmi discovery -l
4 GPUs found.
+--------+----------------------------------------------------------------------+
| GPU ID | Device Information                                                   |
+--------+----------------------------------------------------------------------+
| 0      | Name: Quadro GV100                                                   |
|        | PCI Bus ID: 00000000:01:00.0                                         |
|        | Device UUID: GPU-0ff8396c-51a0-9682-41ee-4cc278423f88                |
+--------+----------------------------------------------------------------------+
| 1      | Name: Quadro GV100                                                   |
|        | PCI Bus ID: 00000000:25:00.0                                         |
|        | Device UUID: GPU-b6097f37-0603-0a1d-064f-f0a54d98acf1                |
+--------+----------------------------------------------------------------------+
| 2      | Name: NVIDIA A100-PCIE-40GB                                          |
|        | PCI Bus ID: 00000000:41:00.0                                         |
|        | Device UUID: GPU-b042a986-47a1-f34f-c63f-aae3c3b89c1c                |
+--------+----------------------------------------------------------------------+
| 3      | Name: NVIDIA A100-PCIE-40GB                                          |
|        | PCI Bus ID: 00000000:61:00.0                                         |
|        | Device UUID: GPU-dc31e36f-a2d4-5937-0181-4e843a04ad3b                |
+--------+----------------------------------------------------------------------+
0 NvSwitches found.
+-----------+
| Switch ID |
+-----------+
+-----------+

~/$ nvidia-smi
Fri Mar  3 09:55:07 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  On   | 00000000:61:00.0 Off |                  Off |
| N/A   31C    P0    35W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

My solution was to create a new group called “cuda” and users that belong to this group can run sudo nvidia-smi without a password. I did that by adding a file to /etc/sudoers.d/ with the following:

%cuda ALL=(ALL) NOPASSWD: /usr/bin/nvidia-smi

I’m not sure what the negatives of this approach are. Or rather, how it would be different than if the -acp option was still available. But this works for our environment, for now.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.