Tao model action_recognition train error in the notebook

suvasism · February 3, 2024, 1:34am

hi,

Pls ref: tao_tutorials/notebooks/tao_launcher_starter_kit/action_recognition_net/actionrecognitionnet.ipynb at main · NVIDIA/tao_tutorials · GitHub

Until this point there is no error. The following command encounters error:
tao model action_recognition train -e /specs/experiment_rgb_3d_finetune.yaml -k nvidia_tao results_dir=/results/rgb_3d_ptm model.rgb_pretrained_model_path=/results/pretrained/actionrecognitionnet_vtrainable_v1.0/resnet18_3d_rgb_hmdb5_32.tlt model.rgb_pretrained_num_classes=5
/usr/lib/python3/dist-packages/paramiko/transport.py:220: CryptographyDeprecationWarning: Blowfish has been deprecated and will be removed in a future release
“class”: algorithms.Blowfish,
2024-02-02 14:54:49,404 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2024-02-02 14:54:49,502 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0
2024-02-02 14:54:49,603 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
[2024-02-02 22:54:53,244 - TAO Toolkit - root - ERROR] The indicated experiment spec file /specs/experiment_rgb_3d_finetune.yaml doesn’t exist!
2024-02-02 14:54:53,843 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

hardware information, 4x RTX 3070:
nvidia-smi
Fri Feb 2 17:31:28 2024
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |

Also specs doesn’t have any experiment_rgb_3d_finetune.yaml
ls -l specs/
total 52
-rw-rw-r-- 1 minasm minasm 550 Jun 15 2022 evaluate_joint.yaml
-rw-rw-r-- 1 minasm minasm 425 Jun 15 2022 evaluate_of.yaml
-rw-rw-r-- 1 minasm minasm 429 Jun 15 2022 evaluate_rgb.yaml
-rw-rw-r-- 1 minasm minasm 425 Jun 15 2022 export_of.yaml
-rw-rw-r-- 1 minasm minasm 429 Jun 15 2022 export_rgb.yaml
-rw-rw-r-- 1 minasm minasm 425 Jun 15 2022 infer_of.yaml
-rw-rw-r-- 1 minasm minasm 429 Jun 15 2022 infer_rgb.yaml
-rw-rw-r-- 1 minasm minasm 1002 Jun 15 2022 train_joint_2d.yaml
-rw-rw-r-- 1 minasm minasm 741 Jun 15 2022 train_of_2d.yaml
-rw-rw-r-- 1 minasm minasm 756 Jun 15 2022 train_of_3d_finetune.yaml
-rw-rw-r-- 1 minasm minasm 787 Jun 15 2022 train_rgb_2d_finetune.yaml
-rw-rw-r-- 1 minasm minasm 782 Jun 15 2022 train_rgb_2d.yaml
-rw-rw-r-- 1 minasm minasm 761 Jun 15 2022 train_rgb_3d_finetune.yaml

Morganh · February 4, 2024, 4:17pm

Please check the ~/.tao_mounts.json file. It will map your local file path to a path inside the docker.
You can run below command to check if the yaml file is available.
!tao model action_recognition run ls /specs/experiment_rgb_3d_finetune.yaml

suvasism · February 6, 2024, 1:53am

hi,
The file is not found:
!tao model action_recognition run ls /specs/experiment_rgb_3d_finetune.yaml

/usr/lib/python3/dist-packages/paramiko/transport.py:220: CryptographyDeprecationWarning: Blowfish has been deprecated and will be removed in a future release
“class”: algorithms.Blowfish,
2024-02-05 17:48:02,026 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2024-02-05 17:48:02,128 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0
2024-02-05 17:48:02,293 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 288:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/minasm/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2024-02-05 17:48:02,293 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
ls: cannot access ‘/specs/experiment_rgb_3d_finetune.yaml’: No such file or directory
2024-02-05 17:48:03,137 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

~/.tao_mounts.json looks correct:

{
“Mounts”: [
{
“source”: “/home/minasm/suvasis/tools/NVIDIA/nvidia_examples/action_recognition/action_recognition_net/data”,
“destination”: “/data”
},
{
“source”: “/home/minasm/suvasis/tools/NVIDIA/nvidia_examples/action_recognition/action_recognition_net/specs”,
“destination”: “/specs”
},
{
“source”: “/home/minasm/suvasis/tools/NVIDIA/nvidia_examples/action_recognition/action_recognition_net/results”,
“destination”: “/results”
},
{
“source”: “/home/minasm/.cache”,
“destination”: “/root/.cache”
}
],
“DockerOptions”: {
“shm_size”: “16G”,
“ulimits”: {
“memlock”: -1,
“stack”: 67108864
}
}
}

Morganh · February 6, 2024, 8:06am

How about removing above?

suvasism · February 8, 2024, 6:57pm

I removed /root/.cache and run the whole example again. Still getting the result:

!tao model action_recognition run ls /specs/experiment_rgb_3d_finetune.yaml

/usr/lib/python3/dist-packages/paramiko/transport.py:220: CryptographyDeprecationWarning: Blowfish has been deprecated and will be removed in a future release
“class”: algorithms.Blowfish,
2024-02-07 15:42:03,262 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2024-02-07 15:42:03,364 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0
2024-02-07 15:42:03,462 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 288:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/minasm/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2024-02-07 15:42:03,462 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
ls: cannot access ‘/specs/experiment_rgb_3d_finetune.yaml’: No such file or directory
2024-02-07 15:42:03,946 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

Mapping up the local directories to the TAO docker.

import json
import os
mounts_file = os.path.expanduser(“~/.tao_mounts.json”)
tlt_configs = {
“Mounts”:[
{
“source”: os.environ[“HOST_DATA_DIR”],
“destination”: “/data”
},
{
“source”: os.environ[“HOST_SPECS_DIR”],
“destination”: “/specs”
},
{
“source”: os.environ[“HOST_RESULTS_DIR”],
“destination”: “/results”
}
],
“DockerOptions”: {
“shm_size”: “16G”,
“ulimits”: {
“memlock”: -1,
“stack”: 67108864
}
}
}

Writing the mounts file.

with open(mounts_file, “w”) as mfile:
json.dump(tlt_configs, mfile, indent=4)

!cat ~/.tao_mounts.json

{
“Mounts”: [
{
“source”: “/home/minasm/suvasis/tools/NVIDIA/nvidia_examples/action_recognition/action_recognition_net/data”,
“destination”: “/data”
},
{
“source”: “/home/minasm/suvasis/tools/NVIDIA/nvidia_examples/action_recognition/action_recognition_net/specs”,
“destination”: “/specs”
},
{
“source”: “/home/minasm/suvasis/tools/NVIDIA/nvidia_examples/action_recognition/action_recognition_net/results”,
“destination”: “/results”
}
],
“DockerOptions”: {
“shm_size”: “16G”,
“ulimits”: {
“memlock”: -1,
“stack”: 67108864
}
}
}

Morganh · February 9, 2024, 2:13am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Suggest to modify above to below to ease your work.
{
“source”: “/home/minasm/suvasis/”,
“destination”: “/home/minasm/suvasis/”
}
Then, the path looks the same inside the docker.

system · March 8, 2024, 7:32am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error in TAO-Toolkit while training TAO Toolkit	15	1511	July 6, 2022
LPRNet Error TAO Toolkit	13	228	June 19, 2024
Tao model error TAO Toolkit	9	110	October 21, 2024
Tao toolkit version5 is getting error when comes to training part TAO Toolkit	45	1714	August 22, 2023
FileNotFoundError: Model not found TAO Toolkit	5	113	July 27, 2024
Error while training detectnet v2 taotollkit on default notebook TAO Toolkit	2	307	March 9, 2024
Tao toolkit detectnet training kitty format error TAO Toolkit	10	415	December 8, 2023
Classification_pyt error TAO Toolkit jetson	16	91	September 18, 2024
TAO 5.3 docker error - Not supported URL scheme http+docker (requests 2.31.0) TAO Toolkit	5	776	July 14, 2024
Train.yaml Doesn't exist! TAO Toolkit	16	477	June 11, 2024

Tao model action_recognition train error in the notebook

Mapping up the local directories to the TAO docker.

Writing the mounts file.

Related topics