Error running tao container image

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
T4
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
*
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
v3.22.05
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I am running the newest version of TAO on a T4 machine - everything was working fine until today when suddently i am getting the below error:
image

Simple command to reproduce: “docker run nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3

Error happens on my RTX3060 machine as well.

Trying to run a training session from the launcher with “tao classification …” causes the container to exit instantly.

Did something change?

br, Mathias

I can reproduce this issue. Checking ongoing.

Hi,
Please try below workaround.

$ docker run --runtime=nvidia -it --rm --entrypoint “” nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3 /bin/bash

The 2nd workaround is for uses who want to use tao launcher instead of “docker run”.
Step:

  1. Add "entrypoint": "" to ~/.tao_mounts.json
    "DockerOptions":{
          "entrypoint": "" ,
          "shm_size": "16G",
  1. Modify lib/python3.6/site-packages/tao/components/docker_handler/docker_handler.py . This file should be available when you install nvidia-tao.

VALID_DOCKER_ARGS = [“user”, “ports”, “shm_size”, “ulimits”, “privileged”, “network”]

to

VALID_DOCKER_ARGS = [“user”, “ports”, “shm_size”, “ulimits”, “privileged”, “network”, “entrypoint”]

1 Like

Thanks for the swift reply,

Workarounds are good,

/M

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.