Error when pulling a tao-toolkit docker file

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

When I run the ‘tao deformable_detr train -e ./data/exp_spec_file.yaml’ command, it tries to pull the tao-toolkit:4.0.0-pyt image, but I get the following error

Error response from daemon: No such container: 8fdd756d2838f330ae221b7826a0e8702de03722160bbca0be855f23dd314436

So I manually downloaded the image, but I can’t find a way to link the manually downloaded image to the script so that it finds the image from local instead of pulling it from the hub? Or could you guys check the download script?

Also, I have the tao_mounts.json written in the following way.
{
“Mounts”: [
{
“source”: “/data/NeuBoat/Avikus/FLL/images/”,
“destination”: “/workspace/tao-experiments/data/images/”
},
{
“source”: “/home/jhhwang/Workspace/TAO/data/”,
“destination”: “/workspace/tao-experiments/data/”
}
],
“DockerOptions”: {
“user”: “1000:1000”,
“ports”: {
“8888”: 8888
}
}
}

Please run
$ docker pull nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt

More dockers can be found in

ahh yes, I have the tao-toolkit:4.0.0-pyt image pulled. Should I have the image running prior to executing tao command?

It not needed.
Actually there are two ways.

  1. Using tao launcher.
    For example,
    $ tao detectnet_v2

  2. Using docker directly
    $ docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt /bin/bash

Yeah, I have been following the first way, which uses ‘the launcher CLI’ described in this link.
https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html

In the ‘Launcher CLI’ section, I have done all the previous steps, so I guess I should be able the run the tao command. So I ran the tao command, with ‘deformable_detr’ task and ‘train’ sub_task. But I get the error described in the thread.

Can you share the full log?

yeah sure,

2023-06-19 13:56:50,149 [INFO] root: Registry: [‘nvcr.io’]
2023-06-19 13:56:50,189 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
Error response from daemon: No such container: 8fdd756d2838f330ae221b7826a0e8702de03722160bbca0be855f23dd314436
2023-06-19 13:56:51,018 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

This is the full log that I get when I execute the following command.
tao deformable_detr train -e ./data/exp_spec_file.yaml

This looks to be a problem with nvcr.io as others are reporting issues.
I have the team looking into this now.

Tom

1 Like

@junghyun.hwang
How about running
$ tao info --verbose

and

$ tao ssd run /bin/bash

1 Like

Yeah, I still get a similar error.
I was able to pull the container,
but then it says that there is no such container.

2023-06-21 11:13:54,812 [INFO] root: Registry: [‘nvcr.io’]
2023-06-21 11:13:54,874 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5
2023-06-21 11:13:54,899 [INFO] tlt.components.docker_handler.docker_handler: The required docker doesn’t exist locally/the manifest has changed. Pulling a new docker.
2023-06-21 11:13:54,899 [INFO] tlt.components.docker_handler.docker_handler: Pulling the required container. This may take several minutes if you’re doing this for the first time. Please wait here.

Pulling from repository: nvcr.io/nvidia/tao/tao-toolkit
2023-06-21 11:19:04,283 [INFO] tlt.components.docker_handler.docker_handler: Container pull complete.
Error response from daemon: No such container: 26fe41a0a8c281101aaa9cdde7e2d4e1b73993c3cbdbe36c0f1dddab7f7ce40d
2023-06-21 11:19:09,822 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

As Tom I said,
I can wait until the nvcr.io issues are fixed.

OK, you can try the workaround. Login the docker directly using docker run.

docker run --runtime=nvidia --shm-size=16g --ulimit memlock=-1 -it --rm nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5 /bin/bash

Then run training, etc. Please note that run the command without tao in the beginning of command line.
For example,
$ ssd train xxx

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

For your case, need to trigger the pytorch docker.

docker run --runtime=nvidia --shm-size=16g --ulimit memlock=-1 -it --rm nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt /bin/bash

$ deformable_detr train xxx

is there any issue with ngc registry model?
I listed available pretrained backbones for object detection, but it seems there is none.
the ngc version is the latest version 3.24


The command came from the following link.

Use below instead.

ngc registry model list nvidia/tao/pretrained_object_detection:*

Refer to
https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html#listing-all-available-models

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.