Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
When I run the ‘tao deformable_detr train -e ./data/exp_spec_file.yaml’ command, it tries to pull the tao-toolkit:4.0.0-pyt image, but I get the following error
Error response from daemon: No such container: 8fdd756d2838f330ae221b7826a0e8702de03722160bbca0be855f23dd314436
So I manually downloaded the image, but I can’t find a way to link the manually downloaded image to the script so that it finds the image from local instead of pulling it from the hub? Or could you guys check the download script?
Also, I have the tao_mounts.json written in the following way.
{
“Mounts”: [
{
“source”: “/data/NeuBoat/Avikus/FLL/images/”,
“destination”: “/workspace/tao-experiments/data/images/”
},
{
“source”: “/home/jhhwang/Workspace/TAO/data/”,
“destination”: “/workspace/tao-experiments/data/”
}
],
“DockerOptions”: {
“user”: “1000:1000”,
“ports”: {
“8888”: 8888
}
}
}
In the ‘Launcher CLI’ section, I have done all the previous steps, so I guess I should be able the run the tao command. So I ran the tao command, with ‘deformable_detr’ task and ‘train’ sub_task. But I get the error described in the thread.
Yeah, I still get a similar error.
I was able to pull the container,
but then it says that there is no such container.
2023-06-21 11:13:54,812 [INFO] root: Registry: [‘nvcr.io’]
2023-06-21 11:13:54,874 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5
2023-06-21 11:13:54,899 [INFO] tlt.components.docker_handler.docker_handler: The required docker doesn’t exist locally/the manifest has changed. Pulling a new docker.
2023-06-21 11:13:54,899 [INFO] tlt.components.docker_handler.docker_handler: Pulling the required container. This may take several minutes if you’re doing this for the first time. Please wait here.
…
Pulling from repository: nvcr.io/nvidia/tao/tao-toolkit
2023-06-21 11:19:04,283 [INFO] tlt.components.docker_handler.docker_handler: Container pull complete.
Error response from daemon: No such container: 26fe41a0a8c281101aaa9cdde7e2d4e1b73993c3cbdbe36c0f1dddab7f7ce40d
2023-06-21 11:19:09,822 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
As Tom I said,
I can wait until the nvcr.io issues are fixed.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks
For your case, need to trigger the pytorch docker.
docker run --runtime=nvidia --shm-size=16g --ulimit memlock=-1 -it --rm nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt /bin/bash
is there any issue with ngc registry model?
I listed available pretrained backbones for object detection, but it seems there is none.
the ngc version is the latest version 3.24