Docker container created from TAO toolkit image shuts down by itself

Please provide the following information when requesting support.

• Hardware: GeForce 2080Ti
• Network Type: actionrecognitionnet
• TLT Version: v3.21.11-py3
• Training spec file(
train_rgb_3d_finetune.yaml (2.6 KB)
)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I was following the instructions on nvidia developer blog, implementing 3d convolution on action recognition task.

When executing

!tao action_recognition train \
                  -e $SPECS_DIR/train_rgb_3d_finetune.yaml \
                  -r $RESULTS_DIR/rgb_3d_ptm/3rd \
                  -k $KEY \
                  model_config.rgb_pretrained_model_path=$RESULTS_DIR/pretrained/actionrecognitionnet_vtrainable_v1.0/resnet18_3d_rgb_hmdb5_32.tlt

part, it used to work well. It seemed like this command line creates a docker container from the tlt image and fine-tune the model from the specified pretrained model with training specification .yaml file.
However, some duplicated containers were created as I occasionally and repeatedly work on this command line, so I “docker rm” many of these containers.

Now, when I execute exactly the same line, the container shuts down with no error message.

2022-06-16 05:35:55,010 [INFO] root: Registry: ['nvcr.io']
2022-06-16 05:35:55,164 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-pyt:v3.21.11-py3
2022-06-16 05:35:55,244 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/sandia/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2022-06-16 05:35:57,577 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

when I “docker ps”, no containers show up. Neither for the “docker ps -a”, so it is completely shut down and deleted. I tried to see how tao action_recognition opens the docker container (like, docker run command), but I could not find out.

So I tried just typing “tao action_recognition”
and I see that it creates a container and closes by itself after 3 seconds.

Refer to Chmod: cannot access '/opt/ngccli/ngc': No such file or directory - #2 by Morganh

Thanks a lot, it worked perfectly!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.