Please provide the following information when requesting support.
• Hardware: GeForce 2080Ti
• Network Type: actionrecognitionnet
• TLT Version: v3.21.11-py3
• Training spec file(
train_rgb_3d_finetune.yaml (2.6 KB)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
I was following the instructions on nvidia developer blog, implementing 3d convolution on action recognition task.
!tao action_recognition train \ -e $SPECS_DIR/train_rgb_3d_finetune.yaml \ -r $RESULTS_DIR/rgb_3d_ptm/3rd \ -k $KEY \ model_config.rgb_pretrained_model_path=$RESULTS_DIR/pretrained/actionrecognitionnet_vtrainable_v1.0/resnet18_3d_rgb_hmdb5_32.tlt
part, it used to work well. It seemed like this command line creates a docker container from the tlt image and fine-tune the model from the specified pretrained model with training specification .yaml file.
However, some duplicated containers were created as I occasionally and repeatedly work on this command line, so I “docker rm” many of these containers.
Now, when I execute exactly the same line, the container shuts down with no error message.
2022-06-16 05:35:55,010 [INFO] root: Registry: ['nvcr.io'] 2022-06-16 05:35:55,164 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-pyt:v3.21.11-py3 2022-06-16 05:35:55,244 [WARNING] tlt.components.docker_handler.docker_handler: Docker will run the commands as root. If you would like to retain your local host permissions, please add the "user":"UID:GID" in the DockerOptions portion of the "/home/sandia/.tao_mounts.json" file. You can obtain your users UID and GID by using the "id -u" and "id -g" commands on the terminal. 2022-06-16 05:35:57,577 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
when I “docker ps”, no containers show up. Neither for the “docker ps -a”, so it is completely shut down and deleted. I tried to see how tao action_recognition opens the docker container (like, docker run command), but I could not find out.
So I tried just typing “tao action_recognition”
and I see that it creates a container and closes by itself after 3 seconds.