Error while running action recognition net

I am running the action recognition net jupyter notebook on my virtual machine with the following specifications.
• Hardware (T4/V100/Xavier/Nano/etc) - v100
• Network Type - action_recognition_net
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022

And getting errors as shown in the attached screenshot.

Also attaching the jupyter notebook with outputs i got.
actionrecognitionnet-Copy1 (1).ipynb (725.4 KB)

Someone could help me out

Please refer to TAO Toolkit Quick Start Guide - NVIDIA Docs and make sure nvidia-docker2 is installed.

How to check nvidia-docker2 is installed?

You can run
$ sudo docker run --rm --runtime=nvidia -ti nvidia/cuda

Refer to https://developer.nvidia.com/blog/gpu-containers-runtime/ to install if not.
$ sudo apt-get install -y nvidia-docker2

1 Like

I got this error when i run docker run command you said
image

$ sudo apt-get install -y nvidia-docker2

I have installed it still getting same error

still getting error response from dameon

$ sudo apt install nvidia-container-toolkit
$ sudo apt-get install nvidia-docker2
$ sudo pkill -SIGHUP dockerd

I already have container toolkit and installed nvidia-docker2 and also ran the last command you mentioned.
And all were ran succefully.
what’s the next step?

this is the output of nvidia-smi:

Please check the original issue. If issue is gone, we can close this topic.

I am still getting the original issue

and getting this error after docker2 installation:

please help me out

It is not the same issue. And also it is not the original issue. Please ignore it.
Please check again in the notebook. That is the original issue.

The error after nvidia docker2 installation is changed but the original issue while running in the notebook is same

Please exit the notebook.
And trigger notebook again and run it.

Hey Morganh, now getting this error

Please follow 4.1 section of the notebook.
“=” in the checkpoint file name should removed before using the checkpoint in command.

I am not getting could please explain a bit where exactly i have to make changes?
which file is the checkpoint file and where it is located