Wenzy
September 5, 2023, 8:58pm
1
Hello,
I am trying to deploy the models using the Triton Inference Server.
docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:22.09-py3 tritonserver --model-repository=/models
When I try to run the command from Triton Server Github to launch Triton container, I got the following error:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
If I run it without GPU, it works:
What is the possible reason of this problem?
Thanks!
Environment:
• Hardware Platform (Jetson / GPU)
GPU
• Triton Server Image
22.09-py3
• CUDA Version
12.0
• Docker Version
24.0.5
Morganh
September 6, 2023, 2:44am
3
Please
$ sudo apt-get install -y nvidia-docker2
$ sudo apt install nvidia-driver-525
Wenzy
September 6, 2023, 8:05am
4
Hi,
thank you for your answer.
Both of them I had already installed, nvidia-docker2 version is 2.13.0-1. Driver version is 525.125.06.
Morganh
September 6, 2023, 9:32am
5
Can you find the lib?
$ sudo find / -name libnvidia-ml.so.1
Morganh
September 6, 2023, 9:35am
6
Please add --runtime=nvidia
in the docker run command.
Wenzy
September 6, 2023, 10:11am
7
Yes, it can be found, here are output:
/usr/lib/i386-linux-gnu/libnvidia-ml.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
find: ‘/run/user/1000/doc’: Permission denied
find: ‘/run/user/1000/gvfs’: Permission denied
/var/snap/docker/common/var-lib-docker/overlay2/20cdacd0d96be0cd178f108fd419b3e05a943f4956de496986539a41e57d2cf3/diff/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
Wenzy
September 6, 2023, 10:13am
8
When I add --runtime=nvidia
, there is also an error:
docker: Error response from daemon: Unknown runtime specified nvidia.
See ‘docker run --help’.
Morganh
September 6, 2023, 4:01pm
9
Could you please double check if you install nvidia-docker on the machine?
Morganh
September 7, 2023, 3:04am
11
Wenzy
September 10, 2023, 4:47pm
12
Hi Morganh,
I’ve retried these commands, they actually have been run before, so the issue still exists.
Morganh
September 11, 2023, 5:56am
13
Can you try below and share the result?
$ docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
and
$ sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Wenzy
September 12, 2023, 6:38pm
14
The results are the same as before.
docker: Error response from daemon: Unknown runtime specified nvidia.
See ‘docker run --help’.
Morganh
September 13, 2023, 6:53am
15
Please try to
sudo apt install -y nvidia-docker2
sudo systemctl daemon-reload
sudo systemctl restart docker
Refer to https://github.com/NVIDIA/nvidia-docker/issues/838
Wenzy
September 13, 2023, 8:18am
16
It still doesn’t work. As long as I run it with --runtime==nvidia, I get this error:
docker: Error response from daemon: Unknown runtime specified nvidia.
See ‘docker run --help’.
Without --runtime but run with --gpus all, get this:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
Morganh
September 13, 2023, 8:26am
17
Can you share /etc/nvidia-container-runtime/config.toml ?
Morganh
September 13, 2023, 10:03am
19
Please try to reinstall nvidia-driver.
Uninstall:
sudo apt purge nvidia-driver-525
sudo apt autoremove
sudo apt autoclean
Install: sudo apt install nvidia-driver-525
Wenzy
September 13, 2023, 10:26am
20
Thank you for the answers, but it doesn’t seem to change anything after reinstall.
Morganh
September 18, 2023, 3:16am
21
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks
Please try with New computer install GPU Docker error - #6 by david9xqqb , especially, sudo systemctl restart docker.service
.