Docker instantiation failed with error: 500 Server Error: Internal Server Error ("OCI runtime create failed...)

LwsChlds · June 9, 2021, 12:25am

Hi,

When trying to run “tlt detectnet_v2 --help” or any other tlt command I get the error of

Docker instantiation failed with error: 500 Server Error: Internal Server Error ("OCI 
runtime create failed: container_linux.go:367: starting container process caused: 
process_linux.go:495: container init caused: Running hook #0:: error running hook: exit 
status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to 
process request: unknown")

“tlt info” returns this

Configuration of the TLT Instance
dockers: ['nvcr.io/nvidia/tlt-streamanalytics', 'nvcr.io/nvidia/tlt-pytorch']
format_version: 1.0
tlt_version: 3.0
published_date: 02/02/2021

and “tlt --help” runs as expected.

Any help or advice on how to fix this would be appreciated. I have looked at similar issues posted here but none of the fixes has worked.

Let me know if any more information is required.
Thanks

Morganh · June 9, 2021, 2:47am

Which machine did you install tlt-launcher?

Morganh · June 9, 2021, 2:51am

Please install nvidia-docker2 according to Transfer Learning Toolkit — Transfer Learning Toolkit 3.0 documentation

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

LwsChlds · June 9, 2021, 9:20am

Windows 10 running wsl2 on ubuntu

LwsChlds · June 9, 2021, 9:21am

That is what I followed along with using “docker pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3” Do you know what part would cause this error? Thanks

Morganh · June 9, 2021, 9:26am

Make sure nvidia-docker2 is installed. See TLT Launcher — Transfer Learning Toolkit 3.0 documentation and Transfer Learning Toolkit — Transfer Learning Toolkit 3.0 documentation

LwsChlds · June 9, 2021, 9:50am

I have made sure that everything is installed however I am still getting the same problem.
I also get this warning could it have anything to do with it?

2021-06-09 10:49:19,599 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.

Morganh · June 9, 2021, 9:55am

No, the warning is not related to the error.
Which dgpu in your windows 10 machine? Can you run “nvidia-smi” in the ubuntu?

LwsChlds · June 9, 2021, 9:57am

No I get

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

But I was under the impression that wsl could not get the driver information, is this incorrect?

Running it in PowerShell gives me this

Morganh · June 9, 2021, 10:00am

Please run following commands and paste the result.

nvidia-smi -L
docker run --runtime=nvidia -it nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3 /bin/bash

LwsChlds · June 9, 2021, 10:04am

This does seem to be a problem

From powershell the first command gives:

GPU 0: NVIDIA GeForce MX250 (UUID: GPU-d34ffb24-48c1-80c4-0639-83f477f8e335)

Thanks for the help so far.

Morganh · June 9, 2021, 10:56am

For “unknown runtime specified nvidia”, please install nvidia-docker2.

LwsChlds · June 9, 2021, 12:03pm

As far as I know it is installed already

Morganh · June 9, 2021, 4:36pm

For “docker: Error response from daemon: Unknown runtime specified nvidia.”, please follow https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

sudo systemctl restart docker

For

Docker instantiation failed with error: 500 Server Error: Internal Server Error ("OCI 
runtime create failed: container_linux.go:367: starting container process caused: 
process_linux.go:495: container init caused: Running hook #0:: error running hook: exit 
status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to 
process request: unknown"),

please install gpu driver.

$ apt-cache search nvidia-driver
$ sudo apt install nvidia-driver-455

LwsChlds · June 9, 2021, 11:42pm

Hi,

I still could not get it to work so I tried removing then reinstalling following these guides:
https://ubuntu.com/blog/getting-started-with-cuda-on-ubuntu-on-wsl-2

However now when I try to run any docker like

sudo docker run --gpus all 
nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -
gpu -benchmarkdocker run --gpus all 
nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -
gpu -benchmark

I get the error of:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
ERRO[0000] error waiting for container: context canceled

Once again any advice would be greatly appreciated.
Thanks

Morganh · June 10, 2021, 2:50am

According to your latest description, after you followed Getting started with CUDA on Ubuntu on WSL 2 | Ubuntu, you cannot run any docker, right?
If yes, seems that the error is not related to TLT.
Please double check the steps mentioned in above blog.
Getting started with CUDA on Ubuntu on WSL 2 | Ubuntu
https://developer.nvidia.com/blog/announcing-cuda-on-windows-subsystem-for-linux-2/

LwsChlds · June 10, 2021, 11:48pm

Hi,

Thanks again for the help so far.

I managed to get docker working again and get past the step I was on before however now I am stuck which tlt stating that I am not logged into nvcr.io even when I just ran the command.

Do you know how to fix this?

Morganh · June 11, 2021, 12:24am

See TLT Quick Start Guide — Transfer Learning Toolkit 3.0 documentation,
Once you have installed docker-ce, follow the post-installation steps to ensure that the docker can be run without sudo .

LwsChlds · June 11, 2021, 10:27am

That helped thanks!

I got past the initial problem.

Now when trying to train a SSD model I get the error

FileNotFoundError: [Errno 2] No such file or directory: 'nvidia-smi': 'nvidia-smi'

When ‘which nvidia-smi’ returns

/usr/lib/wsl/lib/nvidia-smi

Do you know how to fix this one?

Morganh · June 11, 2021, 10:30am

Can you share full command and full log?

Topic		Replies	Views
TLT V2.0 Classification TAO Toolkit	26	2983	August 3, 2021
Can I run "tlt lprnet train" command inside a docker container TAO Toolkit	11	814	September 7, 2021
Docker instantiation failed when running tao ssd TAO Toolkit	17	1037	December 28, 2021
Training Peoplent on custom data TAO Toolkit ai-training	19	2902	September 11, 2021
Running tlt- docker.errors.DockerException: Error while fetching server API version TAO Toolkit	16	3837	August 28, 2021
TLT mask rcnn error: Tlt.components.docker_handler.docker_handler: Stopping container TAO Toolkit	23	2208	October 12, 2021
Tlt-train with ssd is not working on the latest container (December 29, 2020) TAO Toolkit	9	652	October 12, 2021
Error while training on tlt TAO Toolkit	4	767	September 5, 2021
Run TLT inside docker TAO Toolkit	9	1646	August 27, 2021
TLT 3.0 & WSL2 issues TAO Toolkit nvbugs	7	1295	December 6, 2021

Docker instantiation failed with error: 500 Server Error: Internal Server Error ("OCI runtime create failed...)

Related topics