• Hardware: A6000
• Network Type MaskRcnn
• TLT Version dockers:
['nvidia/tao/tao-toolkit-tf', 'nvidia/tao/tao-toolkit-pyt', 'nvidia/tao/tao-toolkit-lm']
format_version: 2.0
toolkit_version: 3.21.11
published_date: 11/08/2021
Hey, I tried converting a trained etlt model to engine file but noticed a weird behavior while converting. The docker stops after a few seconds and does not convert anything at all, no logs, no memory usage nothing.
Here are the logs I get:
2022-10-01 07:09:40,315 [INFO] root: Registry: ['nvcr.io']
2022-10-01 07:09:40,342 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-10-01 07:09:40,375 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
[INFO] [MemUsageChange] Init CUDA: CPU +536, GPU +0, now: CPU 542, GPU 19642 (MiB)
[INFO] [MemUsageSnapshot] Builder begin: CPU 848 MiB, GPU 19642 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +749, GPU +318, now: CPU 1669, GPU 19960 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +618, GPU +268, now: CPU 2287, GPU 20228 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
2022-10-01 07:09:45,675 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
I tried a couple of times but nothing seem to work.
Then I tried bashing in to the container and then run command to see if that works, to my surprise when I bash in the container using tao mask_rcnn run /bin/bash
It just exists itself after 5-6 seconds.
Here are the logs:
2022-10-01 07:14:42,935 [INFO] root: Registry: ['nvcr.io']
2022-10-01 07:14:42,962 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-10-01 07:14:43,000 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
root@23b04a10793d:/workspace# 2022-10-01 07:14:48,874 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
As you can see after login I didn’t do anything and within 5 sec it stopped with no log.
I am not sure what is wrong here could you please look into it?