Wenzy
1
Hello,
I am trying to deploy the models using the Triton Inference Server.
docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:22.09-py3 tritonserver --model-repository=/models
When I try to run the command from Triton Server Github to launch Triton container, I got the following error:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
If I run it without GPU, it works:
What is the possible reason of this problem?
Thanks!
Environment:
• Hardware Platform (Jetson / GPU)
GPU
• Triton Server Image
22.09-py3
• CUDA Version
12.0
• Docker Version
24.0.5
Morganh
3
Please
$ sudo apt-get install -y nvidia-docker2
$ sudo apt install nvidia-driver-525
Wenzy
4
Hi,
thank you for your answer.
Both of them I had already installed, nvidia-docker2 version is 2.13.0-1. Driver version is 525.125.06.
Morganh
5
Can you find the lib?
$ sudo find / -name libnvidia-ml.so.1
Morganh
6
Please add --runtime=nvidia
in the docker run command.
Wenzy
7
Yes, it can be found, here are output:
/usr/lib/i386-linux-gnu/libnvidia-ml.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
find: ‘/run/user/1000/doc’: Permission denied
find: ‘/run/user/1000/gvfs’: Permission denied
/var/snap/docker/common/var-lib-docker/overlay2/20cdacd0d96be0cd178f108fd419b3e05a943f4956de496986539a41e57d2cf3/diff/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
Wenzy
8
When I add --runtime=nvidia
, there is also an error:
docker: Error response from daemon: Unknown runtime specified nvidia.
See ‘docker run --help’.
Morganh
9
Could you please double check if you install nvidia-docker on the machine?
Morganh
11
Wenzy
12
Hi Morganh,
I’ve retried these commands, they actually have been run before, so the issue still exists.
Morganh
13
Can you try below and share the result?
$ docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
and
$ sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Wenzy
14
The results are the same as before.
docker: Error response from daemon: Unknown runtime specified nvidia.
See ‘docker run --help’.
Morganh
15
Please try to
sudo apt install -y nvidia-docker2
sudo systemctl daemon-reload
sudo systemctl restart docker
Refer to docker: Error response from daemon: Unknown runtime specified nvidia. · Issue #838 · NVIDIA/nvidia-docker · GitHub
Wenzy
16
It still doesn’t work. As long as I run it with --runtime==nvidia, I get this error:
docker: Error response from daemon: Unknown runtime specified nvidia.
See ‘docker run --help’.
Without --runtime but run with --gpus all, get this:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
Morganh
17
Can you share /etc/nvidia-container-runtime/config.toml ?
Morganh
19
Please try to reinstall nvidia-driver.
Uninstall:
sudo apt purge nvidia-driver-525
sudo apt autoremove
sudo apt autoclean
Install: sudo apt install nvidia-driver-525
Wenzy
20
Thank you for the answers, but it doesn’t seem to change anything after reinstall.
Morganh
21
Please try with New computer install GPU Docker error - #6 by david9xqqb, especially, sudo systemctl restart docker.service
.