Docker instantiation failed with error: 500 Server Error: Internal Server Error ("OCI runtime create failed...)

Hi,

When trying to run “tlt detectnet_v2 --help” or any other tlt command I get the error of

Docker instantiation failed with error: 500 Server Error: Internal Server Error ("OCI 
runtime create failed: container_linux.go:367: starting container process caused: 
process_linux.go:495: container init caused: Running hook #0:: error running hook: exit 
status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to 
process request: unknown")

“tlt info” returns this

Configuration of the TLT Instance
dockers: ['nvcr.io/nvidia/tlt-streamanalytics', 'nvcr.io/nvidia/tlt-pytorch']
format_version: 1.0
tlt_version: 3.0
published_date: 02/02/2021

and “tlt --help” runs as expected.

Any help or advice on how to fix this would be appreciated. I have looked at similar issues posted here but none of the fixes has worked.

Let me know if any more information is required.
Thanks

Which machine did you install tlt-launcher?

Please install nvidia-docker2 according to Transfer Learning Toolkit — Transfer Learning Toolkit 3.0 documentation

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

Windows 10 running wsl2 on ubuntu

That is what I followed along with using “docker pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3” Do you know what part would cause this error? Thanks

Make sure nvidia-docker2 is installed. See TLT Launcher — Transfer Learning Toolkit 3.0 documentation and Transfer Learning Toolkit — Transfer Learning Toolkit 3.0 documentation

I have made sure that everything is installed however I am still getting the same problem.
I also get this warning could it have anything to do with it?

2021-06-09 10:49:19,599 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.

No, the warning is not related to the error.
Which dgpu in your windows 10 machine? Can you run “nvidia-smi” in the ubuntu?

No I get

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

But I was under the impression that wsl could not get the driver information, is this incorrect?

Running it in PowerShell gives me this

Please run following commands and paste the result.

This does seem to be a problem

From powershell the first command gives:

GPU 0: NVIDIA GeForce MX250 (UUID: GPU-d34ffb24-48c1-80c4-0639-83f477f8e335)

Thanks for the help so far.

For “unknown runtime specified nvidia”, please install nvidia-docker2.

As far as I know it is installed already

For “docker: Error response from daemon: Unknown runtime specified nvidia.”, please follow https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

sudo systemctl restart docker

For

Docker instantiation failed with error: 500 Server Error: Internal Server Error ("OCI 
runtime create failed: container_linux.go:367: starting container process caused: 
process_linux.go:495: container init caused: Running hook #0:: error running hook: exit 
status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to 
process request: unknown"),

please install gpu driver.

$ apt-cache search nvidia-driver
$ sudo apt install nvidia-driver-455

Hi,

I still could not get it to work so I tried removing then reinstalling following these guides:
https://ubuntu.com/blog/getting-started-with-cuda-on-ubuntu-on-wsl-2

However now when I try to run any docker like

sudo docker run --gpus all 
nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -
gpu -benchmarkdocker run --gpus all 
nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -
gpu -benchmark

I get the error of:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
ERRO[0000] error waiting for container: context canceled

Once again any advice would be greatly appreciated.
Thanks

According to your latest description, after you followed Getting started with CUDA on Ubuntu on WSL 2 | Ubuntu, you cannot run any docker, right?
If yes, seems that the error is not related to TLT.
Please double check the steps mentioned in above blog.
Getting started with CUDA on Ubuntu on WSL 2 | Ubuntu
https://developer.nvidia.com/blog/announcing-cuda-on-windows-subsystem-for-linux-2/

Hi,

Thanks again for the help so far.

I managed to get docker working again and get past the step I was on before however now I am stuck which tlt stating that I am not logged into nvcr.io even when I just ran the command.

Do you know how to fix this?

See TLT Quick Start Guide — Transfer Learning Toolkit 3.0 documentation,
Once you have installed docker-ce, follow the post-installation steps to ensure that the docker can be run without sudo .

That helped thanks!

I got past the initial problem.

Now when trying to train a SSD model I get the error

FileNotFoundError: [Errno 2] No such file or directory: 'nvidia-smi': 'nvidia-smi'

When ‘which nvidia-smi’ returns

/usr/lib/wsl/lib/nvidia-smi

Do you know how to fix this one?

Can you share full command and full log?