Issues regarding training of ActionRecognitionNet in TAO 5.0.0

silentjcr · August 29, 2023, 5:33am

Recently I’ve moved to an AWS EC2 server where TAO 5.0.0 has been installed and the OS is Ubuntu 22.04 and have stopped using Colab sample code of ActionRecognitionNet for now.
I tried to train a model of ActionRecognitionNet like I did on NVIDIA Colab sample code a few weeks ago, but I encountered some problems that kept me from starting the training succesfully.

I encountered the same problem.

I replaced “output_dir” with “results_dir” as suggested and another error message popped up:

Key 'model_config' not in 'ExperimentConfig'
    full_key: model_config
    object_type=ExperimentConfig

Then I changed “model_config” to "model. Still, an error message appeared:

Key 'train_config' not in 'ExperimentConfig'
    full_key: train_config
    object_type=ExperimentConfig

Modifying “train_config” to “train” didn’t make the training run correctly…

Key 'epochs' not in 'ARTrainExpConfig'
    full_key: train.epochs
    reference_type=ARTrainExpConfig
    object_type=ARTrainExpConfig

Then I had no idea what to modify next. Would like to look for any help or advice, thanks.

Here’s the .yaml file of mine.
train_rgb_3d_finetune.yaml (821 Bytes)

The complete error log is as follows:

Train RGB only model with PTM
2023-08-29 04:47:15,328 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-08-29 04:47:15,389 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt
2023-08-29 04:47:15,421 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 262: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-08-29 04:47:15,421 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
[2023-08-29 04:47:18,541 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Created a temporary directory at /tmp/tmp8bvasjd1
[2023-08-29 04:47:18,542 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Writing /tmp/tmp8bvasjd1/_remote_module_non_scriptable.py
sys:1: UserWarning: 
'train_rgb_3d_finetune.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen core.hydra.hydra_runner>:107: UserWarning: 
'train_rgb_3d_finetune.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
Error merging 'train_rgb_3d_finetune.yaml' with schema
Key 'epochs' not in 'ARTrainExpConfig'
    full_key: train.epochs
    reference_type=ARTrainExpConfig
    object_type=ARTrainExpConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[2023-08-29 04:47:24,192 - TAO Toolkit - root - ERROR] Execution status: FAIL
2023-08-29 04:47:24,689 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 337: Stopping container.

Morganh · August 29, 2023, 5:58am

For TAO 5.0.0, please refer to
https://github.com/NVIDIA/tao_tutorials/blob/main/notebooks/tao_launcher_starter_kit/action_recognition_net/actionrecognitionnet.ipynb.

!tao model action_recognition train \
                  -e $SPECS_DIR/experiment_rgb_3d_finetune.yaml \

And use below yaml file. https://github.com/NVIDIA/tao_tutorials/blob/main/notebooks/tao_launcher_starter_kit/action_recognition_net/specs/experiment_rgb_3d_finetune.yaml

silentjcr · August 30, 2023, 3:56am

Thanks. It worked.
BTW, does that mean the rest of the .yaml files(train, infer, export, evaluate) are of no use in TAO 5.0.0?

Morganh · August 30, 2023, 5:33am

Yes, you can use all-in-one spec in TAO 5.0.0.

system · September 13, 2023, 5:33am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TAO toolkit 4.0 actionrecognitionnet training error TAO Toolkit	5	613	January 17, 2023
TAO toolkit 5.3 actionrecognitionnet training error for joint model, network shape mismatch TAO Toolkit	5	30	December 5, 2024
Some issues regarding running TAO Colab sample codes TAO Toolkit	9	367	August 1, 2023
Error in TAO-Toolkit while training TAO Toolkit	15	1505	July 6, 2022
TAO toolkit 4.0 actionrecognitionnet training error TAO Toolkit tao	7	420	September 25, 2023
Cannot run Dino with tao-5.3.0 TAO Toolkit	7	388	May 17, 2024
ConfigStore schema with the same name TAO Toolkit cudnn	4	12	March 10, 2025
Fine Tuning DINO Retail Object detector - error out as it expects unspecified/unknown configurations TAO Toolkit cudnn , retail-object-detection	6	39	December 30, 2024
TAO re_identification export failure TAO Toolkit	5	486	September 26, 2023
Errors during training in TAO TAO Toolkit	3	391	January 6, 2024

Issues regarding training of ActionRecognitionNet in TAO 5.0.0

Related topics