Nvidia-smi drain "Failed to parse device specified at the command-line"

I’m trying to disable a GPU as suggested in this thread here on the forum. Specifically, I’m running:

nvidia-smi -i 0000:xx:00.0 -pm 0
nvidia-smi drain -p 0000:xx:00.0 -m 1

for some value of xx. Well, the first command succeeds (says the device was already not in persistence mode), but the second command gives me:

Failed to parse device specified at the command-line

I don’t understand what this means. Is my syntax wrong? Is nvidia-smi having trouble with the device?

(Note: Also posted here, as I wasn’t sure what’s the more appropriate venue.)

I don’t have any trouble with it. These commands generally require root privilege, and I am using driver 465.19.01 on a system with 2 GeForce GPUs. Here is what I see:

[root@cluster1 bob]# nvidia-smi
Thu Jun 10 11:19:48 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
| 35%   33C    P0    25W / 130W |      0MiB /  2001MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:03:00.0 N/A |                  N/A |
| N/A   32C    P0    N/A /  N/A |      0MiB /   981MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[root@cluster1 bob]# nvidia-smi -i 0000:02:00.0 -pm 0
Persistence mode is already Disabled for GPU 00000000:02:00.0.
All done.
[root@cluster1 bob]# nvidia-smi drain -q -p 0000:02:00.0
The current drain state of GPU 00000000:02:00.0 is: not draining.
[root@cluster1 bob]# nvidia-smi drain -p 0000:02:00.0 -m 1
Successfully set GPU 00000000:02:00.0 drain state to: draining.
[root@cluster1 bob]# nvidia-smi
Thu Jun 10 11:20:37 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0 N/A |                  N/A |
| N/A   33C    P0    N/A /  N/A |      0MiB /   981MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[root@cluster1 bob]# nvidia-smi drain -p 0000:02:00.0 -m 0
Successfully set GPU 00000000:02:00.0 drain state to: not draining.
[root@cluster1 bob]# nvidia-smi
Thu Jun 10 11:20:53 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
| 35%   33C    P0    25W / 130W |      0MiB /  2001MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:03:00.0 N/A |                  N/A |
| N/A   32C    P0    N/A /  N/A |      0MiB /   981MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[root@cluster1 bob]#

It seems your cross posting has a similar comment “works for me”. I’m not able to explain what the error message means in your case. I would suggest either providing a more complete example as I have done, make sure you are root, and/or update to the latest driver for your GPU (bugs are always possible).

I encountered the same problem, solve it by using exactly the same length of zeros before the first “:”.

For example,

nvidia-smi drain -p 00000000:xx:00.0 -m 1
# which cause error

nvidia-smi drain -p 0000:xx:00.0 -m 1
# Successfully set GPU 00000000:xx:00.0 drain state to: draining.

not sure why the first command

nvidia-smi -i 0000:xx:00.0 -pm 0

doesn’t affect by using longer zeros

2 Likes