Recently I’ve moved to an AWS EC2 server where TAO 5.0.0 has been installed and the OS is Ubuntu 22.04 and have stopped using Colab sample code of ActionRecognitionNet for now.
I tried to train a model of ActionRecognitionNet like I did on NVIDIA Colab sample code a few weeks ago, but I encountered some problems that kept me from starting the training succesfully.
I encountered the same problem.
I replaced “output_dir” with “results_dir” as suggested and another error message popped up:
Key 'model_config' not in 'ExperimentConfig'
full_key: model_config
object_type=ExperimentConfig
Then I changed “model_config” to "model. Still, an error message appeared:
Key 'train_config' not in 'ExperimentConfig'
full_key: train_config
object_type=ExperimentConfig
Modifying “train_config” to “train” didn’t make the training run correctly…
Key 'epochs' not in 'ARTrainExpConfig'
full_key: train.epochs
reference_type=ARTrainExpConfig
object_type=ARTrainExpConfig
Then I had no idea what to modify next. Would like to look for any help or advice, thanks.
Here’s the .yaml file of mine.
train_rgb_3d_finetune.yaml (821 Bytes)
The complete error log is as follows:
Train RGB only model with PTM
2023-08-29 04:47:15,328 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-08-29 04:47:15,389 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt
2023-08-29 04:47:15,421 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 262:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-08-29 04:47:15,421 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
[2023-08-29 04:47:18,541 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Created a temporary directory at /tmp/tmp8bvasjd1
[2023-08-29 04:47:18,542 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Writing /tmp/tmp8bvasjd1/_remote_module_non_scriptable.py
sys:1: UserWarning:
'train_rgb_3d_finetune.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen core.hydra.hydra_runner>:107: UserWarning:
'train_rgb_3d_finetune.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
Error merging 'train_rgb_3d_finetune.yaml' with schema
Key 'epochs' not in 'ARTrainExpConfig'
full_key: train.epochs
reference_type=ARTrainExpConfig
object_type=ARTrainExpConfig
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[2023-08-29 04:47:24,192 - TAO Toolkit - root - ERROR] Execution status: FAIL
2023-08-29 04:47:24,689 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 337: Stopping container.