Fine Tuning Retail Object Detection Models provided in NGC

kalani.wataraka.gamage · November 11, 2024, 6:17am

I want to Test Retail Object Detection Models provided under NGC in TAO. I want to first use the models to run inference, and then fine tune the models with my custom data. However I am not clear of which configuration/spec files to use with each provided model, if they are EfficientDet or DINO models ( and which TAO version)

The following information id provided under the documentation. However this is about models v1.0. Is the documentation not updated after although new models have been released?

Network Architecture: EfficientDet, DINO-FAN_base

The documentation mentioned to use TAO Efficientdet-TF2 for fine tuning the model and has provided the configuration file for that

`
However description for the latest release is given as

DINO (DETR with Improved DeNoising Anchor Boxes) based object detection network to detect retail objects on a checkout counter.

And the latest released trainable model is tagged as dino_model_epoch=011.pth

So its not clear to me what configuration/ spec file needs to be used with this trainable model when running inference with this model/ or to use them as pre-trained model.

Can you please specify which TAO model and specification file need to be used with new or old trainable object detection models released under NGC. And also point to TAO model under TAO documentation (Efficient-det or DINO)? Are they are TF or pytorch models?
And which TAO version we should use with latest released version ( TAO 5.2 as specified in documentation or is it out of date as the latest releases are after that?)

Morganh · November 11, 2024, 3:58pm

Please refer to notebook tao_tutorials/notebooks/tao_launcher_starter_kit/retail_object_detection/retail_object_detection.ipynb at main · NVIDIA/tao_tutorials · GitHub to do finetuning.
The specs files can be found it that folder as well. It will use DINO network. The DINO network locates at TAO pytorch docker.
The latest TAO 5.5 doc is in DINO - NVIDIA Docs.
For inference with TAO, you can refer to the notebook or TAO user guide.
For inference in deepstream, you can refer to deepstream_tao_apps/configs/nvinfer/retail_object_detection_tao at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub and GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream.

kalani.wataraka.gamage · November 12, 2024, 1:37am

Thanks for your quick response.

I was looking at the tao_tutorials/notebooks/tao_launcher_starter_kit/retail_object_detection/retail_object_detection.ipynb at main · NVIDIA/tao_tutorials · GitHub.
And it seems like Tutorials for TAO 5.5 relase and TAO 5.3 release both uses/download trainable_binary_v2.1.1 model.

However the latest model is trainable_retail_object_detection_binary_v2.2.2.3. Is the specs same for this model too? and can we use this model with TAO 5.3 or TAO 5.5?

Morganh · November 12, 2024, 2:31am

Both trainable_binary_v2.1.1 and trainable_retail_object_detection_binary_v2.2.2.3 can be used to do finetune training.
BTW, v1.0 and v1.1 are using EfficientDet. All other versions are using DINO.

corentin87 · November 12, 2024, 11:06pm

Ok thanks. Couple of more questions:

The tutorial you pointed at is referring to trainable_binary_v2.1.1. Is the specs file the same for trainable_retail_object_detection_binary_v2.2.2.3?
And what are the difference between v2.1 and v2.2? Is it just the amount of training data used? And/Or are some training parameters different?

Morganh · November 13, 2024, 5:11am

Yes, you can use the same spec file tao_tutorials/notebooks/tao_launcher_starter_kit/retail_object_detection/specs/train.yaml at main · NVIDIA/tao_tutorials · GitHub. But need to change the pretrained_model_path.

We are updating the model card. But it is not public yet. Please refer to below.

corentin87 · November 14, 2024, 10:37pm

Thanks, that is great information. So the model is larger and you have used more real training data.
It will definitely be great to have those details along each trainable model file.

kalani.wataraka.gamage · November 19, 2024, 2:22am

Is it possible for us to download the training data you have used to train the latest model ? (2211 real images and 226k synthetic data)

Morganh · November 19, 2024, 9:02am

These are the internal dataset. They are not public.

kalani.wataraka.gamage · November 20, 2024, 10:36pm

Thanks. Can you please clarify if DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection is the paper corresponding to the DINO model implantation?

And what is the objective/ use case of distill command for the DINO Model?

kalani.wataraka.gamage · November 22, 2024, 1:55am

Based on the table you have provided above, the backbone for model trainable_retail_object_detection_binary_v2.2.2.3 should be fan_base. However the spec file you have linked above uses fan_small.

So should we change the backbone to fan_base when using trainable_retail_object_detection_binary_v2.2.2.3 as pre-trained model?
Is there any other configurations that needs to be different for the trainable_retail_object_detection_binary_v2.2.2.3 (are there are more outdated specs that need to be fixed)?

Morganh · November 22, 2024, 2:04am

Yes, you can.

No others needs to be different.

Yes, [2203.03605] DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.

You can refer to distillation notebook under tao_tutorials/notebooks/tao_launcher_starter_kit/dino at main · NVIDIA/tao_tutorials · GitHub.

kalani.wataraka.gamage · November 22, 2024, 3:37am

I am getting following error when trying to train the model in TAO 5.5.
Its looking for this configuration cudnn.benchmark = cfg["train"]["cudnn"]["benchmark"] , but I cant find any such configuration in TAO DINO documentation

 tao model dino train \
-e  /workspace/tao-experiments/specs/train.yml
2024-11-22 03:25:19,278 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-11-22 03:25:19,368 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt
2024-11-22 03:25:19,382 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
[2024-11-22 03:25:27,199 - TAO Toolkit - matplotlib.font_manager - INFO] generated new fontManager
/usr/local/lib/python3.10/dist-packages/hydra/plugins/config_source.py:124: UserWarning: Support for .yml files is deprecated. Use .yaml extension for Hydra config files
  deprecation_warning(
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/loggers/api_logging.py:236: UserWarning: Log file already exists at /workspace/tao-experiments/results/trainings/training1/status.json
  rank_zero_warn(
Seed set to 1234
Train results will be saved at: /workspace/tao-experiments/results/trainings/training1
Error executing job with overrides: []Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/decorators/workflow.py", line 69, in _func
    raise e
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/decorators/workflow.py", line 48, in _func
    runner(cfg, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/dino/scripts/train.py", line 146, in main
    run_experiment(experiment_config=cfg,
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/dino/scripts/train.py", line 36, in run_experiment
    results_dir, resume_ckpt, gpus, ptl_loggers = initialize_train_experiment(experiment_config, key)
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/initialize_experiments.py", line 56, in initialize_train_experiment
cudnn.benchmark = cfg["train"]["cudnn"]["benchmark"]omegaconf.errors.ConfigKeyError: Key 'cudnn' is not in struct
    full_key: train.cudnn
    object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[2024-11-22 03:25:35,916 - TAO Toolkit - root - INFO] Sending telemetry data.
[2024-11-22 03:25:35,916 - TAO Toolkit - root - INFO] ================> Start Reporting Telemetry <================
[2024-11-22 03:25:35,916 - TAO Toolkit - root - INFO] Sending {'version': '5.5.0', 'action': 'train', 'network': 'dino', 'gpu': ['Tesla-V100-SXM2-16GB'], 'success': False, 'time_lapsed': 8} to https://api.tao.ngc.nvidia.com.
[2024-11-22 03:25:37,147 - TAO Toolkit - root - INFO] Telemetry sent successfully.
[2024-11-22 03:25:37,148 - TAO Toolkit - root - INFO] ================> End Reporting Telemetry <================
[2024-11-22 03:25:37,148 - TAO Toolkit - root - WARNING] Execution status: FAIL
2024-11-22 03:25:38,297 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

And following is the configuration file:

train:
  freeze: ['backbone', 'transformer.encoder']
  pretrained_model_path: /workspace/tao-experiments/models/retail_object_detection_vtrainable_retail_object_detection_binary_v2.2.2.3/dino_model_epoch011.pth
  num_gpus: 1
  num_nodes: 1
  validation_interval: 1
  checkpoint_interval: 1
  seed: 1234
  results_dir: /workspace/tao-experiments/results/trainings/training1
  optim:
    lr_backbone: 1e-6
    lr: 1e-5
    lr_steps: [11]
    momentum: 0.9
  num_epochs: 12
dataset:
  train_data_sources:
    - image_dir: /workspace/tao-experiments/data/dataset_2024-22-11T0942_1732228936/train
      json_file: /workspace/tao-experiments/data/dataset_2024-22-11T0942_1732228936/annotations/instances_train.json
  val_data_sources:
    - image_dir: /workspace/tao-experiments/data/dataset_2024-22-11T0942_1732228936/test
      json_file: /workspace/tao-experiments/data/dataset_2024-22-11T0942_1732228936/annotations/instances_test.json
  num_classes: 2
  batch_size: 4
  workers: 8
  augmentation:
    fixed_padding: False
model:
  backbone: fan_base
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 900
  num_select: 100
  dropout_ratio: 0.0
  dim_feedforward: 2048
results_dir: /workspace/tao-experiments/results/trainings/training1
encryption_key: nvidia_tao

Based on the pytorch repo, it seems its looking for other configurations such as cfg["train"]["cudnn"]["deterministic"], cfg["train"]["cudnn"]["benchmark"] which are not defined in documentation.

Can you please explain why I am getting this errors? (dont they have default values specified).
And if I am suppose to specify values, can you let me know the values for the above two configurations? Thanks.

Morganh · November 22, 2024, 3:39am

Could you please create a new topic for your latest questions? Thanks a lot.

kalani.wataraka.gamage · November 22, 2024, 3:43am

yeah no worries. created new thread: Fine Tuning DINO Retail Object detector - error out as it expects unspecified/unknown configurations

Morganh · November 22, 2024, 5:03am

OK, let us track in that topic and close this one since it is solved.

corentin87 · February 6, 2025, 6:04am

Hi @Morganh ,
Is there any fan_tiny pre-trained model for DINO retail object detector. According to your table above there is none.
But just wondering if there is still one somewhere.
Thanks

Morganh · February 7, 2025, 4:51am

Yes, there is not.

Topic		Replies	Views
Fine Tuning DINO Retail Object detector - error out as it expects unspecified/unknown configurations TAO Toolkit cudnn , retail-object-detection	6	45	December 30, 2024
Train.yaml Doesn't exist! TAO Toolkit	16	488	June 11, 2024
Issue Running Inference on NVIDIA TAO Retail Object Recognition Model TAO Toolkit python , tao , retail-object-detection	4	56	February 21, 2025
DINO Retail Object Detection - Distillation TAO Toolkit tao , retail-object-detection	13	60	February 12, 2025
Error in TAO-Toolkit while training TAO Toolkit	15	1513	July 6, 2022
Tao toolkit version5 is getting error when comes to training part TAO Toolkit	45	1732	August 22, 2023
Classification_pyt error TAO Toolkit jetson	16	100	September 18, 2024
DINO: Error executing job with overrides TAO Toolkit	12	891	May 28, 2024
Deformable detr model keeps failing to train TAO Toolkit	5	542	February 1, 2024
DINO Training failed :: Default process group has not been initialized TAO Toolkit	5	770	October 3, 2023

Fine Tuning Retail Object Detection Models provided in NGC

Related topics