`nvidia-smi` command not found in Docker Container

I am using Windows Build 21390.co_release.21521-1658.
I follow the instruction to set up CUDA on my WSL and Docker.
I can run the k8s example, but if I want to set up my own environment.
I cannot even find nvidia-smi.
How can I deal with this?

Thank you for trying on WSL!

nvidia-smi is now supported but in order to use it, you have to copy it manually to /usr/bin and set appropriate permissions with the below commands:

cp /usr/lib/wsl/lib/nvidia-smi /usr/bin/nvidia-smi chmod ogu+x /usr/bin/nvidia-smi

Hope this helps

I have a few confusions after I copied nvidia-smi to /usr/bin and tested it.
I followed the instructions here CUDA on WSL :: CUDA Toolkit Documentation. I have Windows build 22000.51.

  1. Just to confirm, copying nvidia-smi to /usr/bin should be done in wsl, not in a docker container, right?

  2. Here is an example output of nvidia-smi when I run it in wsl (not in docker container).
    The third row, third column of the table shows “ERR!”. What’s wrong here?

Sat Jul 3 01:08:05 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.28 Driver Version: 470.76 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:3F:00.0 On | N/A |
| 49% 33C P8 7W / 120W | 1552MiB / 6144MiB | ERR! Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

  1. I can do nvidia-smi on wsl, on a nvidia-docker container running in wsl with cuda toolkit and drivers installed. However, I get “nvidia-smi: command not found” in a container without cuda or drivers installed. Also, the output of nvidia-smi, when I am using a docker container running in wsl with cuda toolkit and drivers installed, has the same cuda and driver versions as the one I showed above, yet I installed cuda 11.0 in the container. Is the nvidia-smi binary in the containers the same one in the host? Could you explain how this works?

  2. I tested nvidia-smi with a python pytorch demo program. The nvidia-smi running on wsl host does not show python as a running process. I run the same demo program in windows and nvidia-smi.exe is able to find python as a running process. Does this indicate that my my wsl is not correctly setup to use Nvidia GPU, or is this a bug of nvidia-smi in wsl?

@xiedesaigg 1- No need for copy nvidia-smi binary since drivers 470.76. If you want to access nvidia-smi inside the container you need to use a Docker volume parameter like -v /usr/lib/wsl/lib/:/usr/local/bin

2- The ERR in nvidia-smi is normal, some support like gpu fan detection is not implemented yet.

3- nvidia-smi is accessing to the Windows Cuda version. After all the device /dev/dxg inside WSL2 is a GPU abstraction of the Windows gpu.

4- Not sure, but as Nvidia notates in the release notes, some nvidia-smi functions won’t work yet.

1 Like