Please provide the following information when requesting support.
• Hardware: RTXA6000ADA
• Network Type : Detectnet_v2
• TLT Version: 4.0.2.api
Re open topic: Exception: TAO4 AUTOML with peoplenet
I was thinking that this issue was finally solve in the release 4.0.2, but appears again.
$ kubectl logs -n gpu-operator tao-toolkit-api-workflow-pod-78848b8764-zc9gx
NGC CLI 3.19.0
AutoML pipeline
Exception in thread Thread-2 (AutoMLPipeline):
Traceback (most recent call last):
File "/usr/local/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.11/threading.py", line 975, in run
self._target(*self._args, **self._kwargs)
File "/opt/api/handlers/actions.py", line 810, in AutoMLPipeline
complete_specs["model_config"]["pretrained_model_file"] = pretrained_model_file[0]
~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
Same situation than before. I’m using the datasets and the train.json file generated to train with the API Client without Automl. With a sucesfull and very satisfied results.
Attach metadata.json (the notebook sample, have missing parts to generate the file)
{
"id": "526f8699-5fbb-47db-ad35-3632acf42152",
"created_on": "2023-05-31T15:45:06.650363",
"last_modified": "2023-05-31T15:45:06.650381",
"name": "My Model",
"description": "My TAO Model",
"version": "1.0.0",
"logo": "https://www.nvidia.com",
"ngc_path": "",
"encryption_key": "tlt_encode",
"read_only": false,
"public": false,
"network_arch": "detectnet_v2",
"dataset_type": "object_detection",
"actions": [
"train",
"evaluate",
"prune",
"retrain",
"export",
"convert",
"inference"
],
"train_datasets": [
"36410922-0967-4b36-be79-2f3aa859c6bc"
],
"eval_dataset": "5c71ff48-f958-4fcb-a5c6-d6d5cd010990",
"inference_dataset": null,
"additional_id_info": null,
"calibration_dataset": null,
"ptm": "00e8bc75-c346-489d-ac31-e6f0e30389db",
"automl_enabled": true,
"automl_algorithm": "HyperBand",
"metric": "map",
"automl_add_hyperparameters": "[]",
"automl_remove_hyperparameters": "[]",
"automl_nu": 3,
"automl_R": 27,
"epoch_multiplier": 10
}
The spec file are working properly in a normal API Client training.
Digging a little bit. When the API generate the files to start the training. Generate a folder with the code of the step, and inside place a txt with the result of mix all the json files to generate the true “spec.file”.
Well i notice that using the AutoML the parameter:
model_config {
pretrained_model_file: "/shared/users/00000000-0000-0000-0000-000000000000/models/00e8bc75-c346-489d-ac31-e6f0e30389db/peoplenet_vtrainable_v2.6/resnet34_peoplenet.tlt"
Is not inserted automatically!!!
The datasets are correctly attached, but the pretrain network not. Maybe this give you any clue to catch the part of the code responsible of that.
Thanks in advance.
I hope that one day can use our tools to finalize my work!