Error when pulling a tao-toolkit docker file

junghyun.hwang · June 19, 2023, 6:40am

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

When I run the ‘tao deformable_detr train -e ./data/exp_spec_file.yaml’ command, it tries to pull the tao-toolkit:4.0.0-pyt image, but I get the following error

Error response from daemon: No such container: 8fdd756d2838f330ae221b7826a0e8702de03722160bbca0be855f23dd314436

So I manually downloaded the image, but I can’t find a way to link the manually downloaded image to the script so that it finds the image from local instead of pulling it from the hub? Or could you guys check the download script?

Also, I have the tao_mounts.json written in the following way.
{
“Mounts”: [
{
“source”: “/data/NeuBoat/Avikus/FLL/images/”,
“destination”: “/workspace/tao-experiments/data/images/”
},
{
“source”: “/home/jhhwang/Workspace/TAO/data/”,
“destination”: “/workspace/tao-experiments/data/”
}
],
“DockerOptions”: {
“user”: “1000:1000”,
“ports”: {
“8888”: 8888
}
}
}

Morganh · June 19, 2023, 6:58am

Please run
$ docker pull nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt

More dockers can be found in

junghyun.hwang · June 19, 2023, 7:05am

ahh yes, I have the tao-toolkit:4.0.0-pyt image pulled. Should I have the image running prior to executing tao command?

Morganh · June 19, 2023, 7:10am

It not needed.
Actually there are two ways.

Using tao launcher.
For example,
$ tao detectnet_v2
Using docker directly
$ docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt /bin/bash

junghyun.hwang · June 19, 2023, 7:20am

Yeah, I have been following the first way, which uses ‘the launcher CLI’ described in this link.
https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html

In the ‘Launcher CLI’ section, I have done all the previous steps, so I guess I should be able the run the tao command. So I ran the tao command, with ‘deformable_detr’ task and ‘train’ sub_task. But I get the error described in the thread.

Morganh · June 19, 2023, 5:42pm

Can you share the full log?

junghyun.hwang · June 20, 2023, 2:01am

yeah sure,

2023-06-19 13:56:50,149 [INFO] root: Registry: [‘nvcr.io’]
2023-06-19 13:56:50,189 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
Error response from daemon: No such container: 8fdd756d2838f330ae221b7826a0e8702de03722160bbca0be855f23dd314436
2023-06-19 13:56:51,018 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

This is the full log that I get when I execute the following command.
tao deformable_detr train -e ./data/exp_spec_file.yaml

TomNVIDIA · June 20, 2023, 1:58pm

This looks to be a problem with nvcr.io as others are reporting issues.
I have the team looking into this now.

Tom

Morganh · June 20, 2023, 3:43pm

@junghyun.hwang
How about running
$ tao info --verbose

and

$ tao ssd run /bin/bash

junghyun.hwang · June 21, 2023, 2:21am

Yeah, I still get a similar error.
I was able to pull the container,
but then it says that there is no such container.

2023-06-21 11:13:54,812 [INFO] root: Registry: [‘nvcr.io’]
2023-06-21 11:13:54,874 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5
2023-06-21 11:13:54,899 [INFO] tlt.components.docker_handler.docker_handler: The required docker doesn’t exist locally/the manifest has changed. Pulling a new docker.
2023-06-21 11:13:54,899 [INFO] tlt.components.docker_handler.docker_handler: Pulling the required container. This may take several minutes if you’re doing this for the first time. Please wait here.
…
Pulling from repository: nvcr.io/nvidia/tao/tao-toolkit
2023-06-21 11:19:04,283 [INFO] tlt.components.docker_handler.docker_handler: Container pull complete.
Error response from daemon: No such container: 26fe41a0a8c281101aaa9cdde7e2d4e1b73993c3cbdbe36c0f1dddab7f7ce40d
2023-06-21 11:19:09,822 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

As Tom I said,
I can wait until the nvcr.io issues are fixed.

Morganh · June 21, 2023, 2:25am

OK, you can try the workaround. Login the docker directly using docker run.

docker run --runtime=nvidia --shm-size=16g --ulimit memlock=-1 -it --rm nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5 /bin/bash

Then run training, etc. Please note that run the command without tao in the beginning of command line.
For example,
$ ssd train xxx

Morganh · June 21, 2023, 2:28am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

For your case, need to trigger the pytorch docker.

docker run --runtime=nvidia --shm-size=16g --ulimit memlock=-1 -it --rm nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt /bin/bash

$ deformable_detr train xxx

junghyun.hwang · July 5, 2023, 4:59am

is there any issue with ngc registry model?
I listed available pretrained backbones for object detection, but it seems there is none.
the ngc version is the latest version 3.24

The command came from the following link.

Morganh · July 5, 2023, 6:04am

Use below instead.

ngc registry model list nvidia/tao/pretrained_object_detection:*

Refer to
https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html#listing-all-available-models

system · July 24, 2023, 5:57am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error while training Deformable_detr using TAO TAO Toolkit tensorflow , nvbugs , python	17	335	March 1, 2024
docker.errors.ImageNotFound: 404 Client Error TAO Toolkit	14	3530	February 18, 2022
docker.errors.ImageNotFound after follow "nvidia/tao/cv_samples:v1.4.1" TAO Toolkit	12	452	November 13, 2022
Tao classification command not pulling the correct version TAO Toolkit	8	714	March 10, 2022
Tao model error TAO Toolkit	9	100	October 21, 2024
Error in TAO-Toolkit while training TAO Toolkit	2	1110	January 4, 2022
TAO 5.3 docker error - Not supported URL scheme http+docker (requests 2.31.0) TAO Toolkit	5	725	July 14, 2024
An error occurred while preparing the data set using TAO TAO Toolkit	14	1376	October 19, 2021
Tao toolkit Error while fetching server API version TAO Toolkit	19	1879	June 15, 2023
LPRNet Error TAO Toolkit	13	227	June 19, 2024

Error when pulling a tao-toolkit docker file

Related topics