• Hardware (T4/V100/Xavier/Nano/etc) RTX4090
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Detectnet_v2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) TAO4 Baremetal
• Training spec file(If have, please share here) /notebooks/tao_api_starter_kit/client/automl/object_detection.ipynb
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
Continuation of this other failure. Same use case, same configuration.
Loading the custom dataset and select the pretrained network based on peoplenet (detectnet_v2)
pretrained_map = {“detectnet_v2” : “peoplenet:trainable_v2.6”}
Launch the AutoMl process and nothing happend.
Not logs generated, no compute movement.
Error found in the container:
AutoML pipeline
Exception in thread Thread-9 (AutoMLPipeline):
Traceback (most recent call last):
File “/usr/local/lib/python3.11/threading.py”, line 1038, in _bootstrap_inner
self.run()
File “/usr/local/lib/python3.11/threading.py”, line 975, in run
self._target(*self._args, **self._kwargs)
File “/opt/api/handlers/actions.py”, line 747, in AutoMLPipeline
complete_specs, handler_metadata = convert_automl_recommendations_to_spec(job_context,recommended_values,job_context.network)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/api/handlers/actions.py”, line 723, in convert_automl_recommendations_to_spec
spec = process_classwise_config(spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/api/handlers/utilities.py”, line 457, in process_classwise_config
bbox_dict = {“key”:class_name[“key”],“value:”:class_name[“value”][“bbox_rasterizer_config”]}
~~~~~~~~~~^^^^^^^
KeyError: ‘key’
Try also to manual modify the specs/train.json to include all the labels, only appear “person”. Bag and face is not defined.
Dataset generated correctly with the images and without warnings.
(Also mention that when the tfrecords have problems with the labels, generate a warning.json, that is automatic deleted when finish the job).
classes.json generated:
["person", "face", "xxxx", "bag", "zzzz", "yyyy"]
datasets/79a50be1-26c0-4c26-976e-1247bed5df62/jobs_metadata/c4a16018-01a6-4d46-9b95-8b6e174038de.json
{
"id": "c4a16018-01a6-4d46-9b95-8b6e174038de",
"parent_id": null,
"action": "convert",
"created_on": "2023-04-21T11:41:10.313023",
"last_modified": "2023-04-21T11:41:50.081030",
"status": "Done",
"result": {
"detailed_status": {
"date": "4/21/2023",
"time": "11:41:39",
"status": "SUCCESS",
"message": "Dataset convert finished successfully."
},
"categorical": [
{
"metric": "num_objects",
"category_wise_values": [
{
"category": "person",
"value": 25104.0
},
{
"category": "face",
"value": 9537.0
},
{
"category": "xxxx",
"value": 1551.0
},
{
"category": "bag",
"value": 3295.0
},
{
"category": "yyyy",
"value": 3176.0
},
{
"category": "zzzz",
"value": 1569.0
}
]
}
],
"kpi": [
{
"metric": "num_images",
"value": 11955.0
}
],
"graphical": [],
"cur_epoch": null,
"epoch": null,
"max_epoch": null,
"eta": null,
"time_per_epoch": null
}
}
dataset job / datasets/79a50be1-26c0-4c26-976e-1247bed5df62/c4a16018-01a6-4d46-9b95-8b6e174038de/status.json
{"date": "4/21/2023", "time": "11:41:27", "status": "STARTED", "verbosity": "INFO", "message": "Starting Object Detection Dataset Convert."}
{"date": "4/21/2023", "time": "11:41:27", "status": "STARTED", "verbosity": "INFO", "message": "Instantiating a kitti converter"}
{"date": "4/21/2023", "time": "11:41:27", "status": "RUNNING", "verbosity": "INFO", "message": "Generating partitions"}
{"date": "4/21/2023", "time": "11:41:27", "status": "RUNNING", "verbosity": "INFO", "message": "Num images in\nTrain: 11955\tVal: 0", "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:27", "status": "RUNNING", "verbosity": "INFO", "message": "Skipped validation data.", "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:27", "status": "RUNNING", "verbosity": "INFO", "message": "Writing partition 0, shard 0", "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:28", "status": "RUNNING", "verbosity": "INFO", "message": "Writing partition 0, shard 1", "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:29", "status": "RUNNING", "verbosity": "INFO", "message": "Writing partition 0, shard 2", "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:30", "status": "RUNNING", "verbosity": "INFO", "message": "Writing partition 0, shard 3", "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:31", "status": "RUNNING", "verbosity": "INFO", "message": "Writing partition 0, shard 4", "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:33", "status": "RUNNING", "verbosity": "INFO", "message": "Writing partition 0, shard 5", "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:34", "status": "RUNNING", "verbosity": "INFO", "message": "Writing partition 0, shard 6", "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:35", "status": "RUNNING", "verbosity": "INFO", "message": "Writing partition 0, shard 7", "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:36", "status": "RUNNING", "verbosity": "INFO", "message": "Writing partition 0, shard 8", "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:37", "status": "RUNNING", "verbosity": "INFO", "message": "Writing partition 0, shard 9", "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:39", "status": "RUNNING", "verbosity": "INFO", "message": "Cumulative object statistics", "categorical": {"num_objects": {"person": 25104, "face": 9537, "xxxx": 1551, "bag": 3295, "yyyy": 3176, "zzzz": 1569}}, "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:39", "status": "RUNNING", "verbosity": "INFO", "message": "Class map. \nLabel in GT: Label in tfrecords file \nb'person': b'person'\nb'face': b'face'\nb'xxxx': b'xxxx'\nb'bag': b'bag'\nb'yyyy': b'yyyy'\nb'zzzz': b'zzzz'", "categorical": {"num_objects": {"person": 25104, "face": 9537, "xxxx": 1551, "bag": 3295, "yyyy": 3176, "zzzz": 1569}}, "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:39", "status": "RUNNING", "verbosity": "INFO", "message": "For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.\n", "categorical": {"num_objects": {"person": 25104, "face": 9537, "xxxx": 1551, "bag": 3295, "yyyy": 3176, "zzzz": 1569}}, "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:39", "status": "SUCCESS", "verbosity": "INFO", "message": "TFRecords generation complete.", "categorical": {"num_objects": {"person": 25104, "face": 9537, "xxxx": 1551, "bag": 3295, "yyyy": 3176, "zzzz": 1569}}, "kpi": {"num_images": 11955}}
{"date": "4/21/2023", "time": "11:41:39", "status": "SUCCESS", "verbosity": "INFO", "message": "Dataset convert finished successfully.", "categorical": {"num_objects": {"person": 25104, "face": 9537, "xxxx": 1551, "bag": 3295, "yyyy": 3176, "zzzz": 1569}}, "kpi": {"num_images": 11955}}
models/5688544b-c272-48f6-b315-6cdd852302b5/metadata.json
{
"id": "5688544b-c272-48f6-b315-6cdd852302b5",
"created_on": "2023-04-21T11:58:23.335331",
"last_modified": "2023-04-21T14:10:05.286848",
"name": "My Model",
"description": "My TAO Model",
"version": "1.0.0",
"logo": "https://www.nvidia.com",
"ngc_path": "",
"encryption_key": "tlt_encode",
"read_only": false,
"public": false,
"network_arch": "detectnet_v2",
"dataset_type": "object_detection",
"actions": [
"train",
"evaluate",
"prune",
"retrain",
"export",
"convert",
"inference"
],
"train_datasets": [
"79a50be1-26c0-4c26-976e-1247bed5df62"
],
"eval_dataset": "b5a65f17-4129-48fe-9468-15aebebf590a",
"inference_dataset": null,
"additional_id_info": null,
"calibration_dataset": null,
"ptm": "00e8bc75-c346-489d-ac31-e6f0e30389db",
"automl_enabled": true,
"automl_algorithm": "HyperBand",
"metric": "map",
"automl_add_hyperparameters": "[]",
"automl_remove_hyperparameters": "[]",
"automl_max_recommendations": 10
}
Specs file, only modify the next parameters:
specs["model_config"]["num_layers"] = 34
specs["model_config"]["freeze_blocks"] = [0,1]
specs["training_config"]["batch_size_per_gpu"] = 64
Have a good monday!
Best regards.