Hi,
When trying to run “tlt detectnet_v2 --help” or any other tlt command I get the error of
Docker instantiation failed with error: 500 Server Error: Internal Server Error ("OCI
runtime create failed: container_linux.go:367: starting container process caused:
process_linux.go:495: container init caused: Running hook #0:: error running hook: exit
status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to
process request: unknown")
“tlt info” returns this
Configuration of the TLT Instance
dockers: ['nvcr.io/nvidia/tlt-streamanalytics', 'nvcr.io/nvidia/tlt-pytorch']
format_version: 1.0
tlt_version: 3.0
published_date: 02/02/2021
and “tlt --help” runs as expected.
Any help or advice on how to fix this would be appreciated. I have looked at similar issues posted here but none of the fixes has worked.
Let me know if any more information is required.
Thanks
Which machine did you install tlt-launcher?
Windows 10 running wsl2 on ubuntu
That is what I followed along with using “docker pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3” Do you know what part would cause this error? Thanks
I have made sure that everything is installed however I am still getting the same problem.
I also get this warning could it have anything to do with it?
2021-06-09 10:49:19,599 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
No, the warning is not related to the error.
Which dgpu in your windows 10 machine? Can you run “nvidia-smi” in the ubuntu?
No I get
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
But I was under the impression that wsl could not get the driver information, is this incorrect?
Running it in PowerShell gives me this
Please run following commands and paste the result.
This does seem to be a problem
From powershell the first command gives:
GPU 0: NVIDIA GeForce MX250 (UUID: GPU-d34ffb24-48c1-80c4-0639-83f477f8e335)
Thanks for the help so far.
For “unknown runtime specified nvidia”, please install nvidia-docker2.
As far as I know it is installed already
For “docker: Error response from daemon: Unknown runtime specified nvidia.”, please follow https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
sudo systemctl restart docker
For
Docker instantiation failed with error: 500 Server Error: Internal Server Error ("OCI
runtime create failed: container_linux.go:367: starting container process caused:
process_linux.go:495: container init caused: Running hook #0:: error running hook: exit
status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to
process request: unknown"),
please install gpu driver.
$ apt-cache search nvidia-driver
$ sudo apt install nvidia-driver-455
Hi,
I still could not get it to work so I tried removing then reinstalling following these guides:
https://ubuntu.com/blog/getting-started-with-cuda-on-ubuntu-on-wsl-2
However now when I try to run any docker like
sudo docker run --gpus all
nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -
gpu -benchmarkdocker run --gpus all
nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -
gpu -benchmark
I get the error of:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
ERRO[0000] error waiting for container: context canceled
Once again any advice would be greatly appreciated.
Thanks
According to your latest description, after you followed Getting started with CUDA on Ubuntu on WSL 2 | Ubuntu, you cannot run any docker, right?
If yes, seems that the error is not related to TLT.
Please double check the steps mentioned in above blog.
Getting started with CUDA on Ubuntu on WSL 2 | Ubuntu
https://developer.nvidia.com/blog/announcing-cuda-on-windows-subsystem-for-linux-2/
Hi,
Thanks again for the help so far.
I managed to get docker working again and get past the step I was on before however now I am stuck which tlt stating that I am not logged into nvcr.io even when I just ran the command.
Do you know how to fix this?
See TLT Quick Start Guide — Transfer Learning Toolkit 3.0 documentation,
Once you have installed docker-ce, follow the post-installation steps to ensure that the docker can be run without sudo
.
That helped thanks!
I got past the initial problem.
Now when trying to train a SSD model I get the error
FileNotFoundError: [Errno 2] No such file or directory: 'nvidia-smi': 'nvidia-smi'
When ‘which nvidia-smi’ returns
/usr/lib/wsl/lib/nvidia-smi
Do you know how to fix this one?
Can you share full command and full log?