Fine Tuning Retail Object Detection Models provided in NGC

I want to Test Retail Object Detection Models provided under NGC in TAO. I want to first use the models to run inference, and then fine tune the models with my custom data. However I am not clear of which configuration/spec files to use with each provided model, if they are EfficientDet or DINO models ( and which TAO version)

The following information id provided under the documentation. However this is about models v1.0. Is the documentation not updated after although new models have been released?

Network Architecture: EfficientDet, DINO-FAN_base

The documentation mentioned to use TAO Efficientdet-TF2 for fine tuning the model and has provided the configuration file for that


`
However description for the latest release is given as

DINO (DETR with Improved DeNoising Anchor Boxes) based object detection network to detect retail objects on a checkout counter.

And the latest released trainable model is tagged as dino_model_epoch=011.pth

So its not clear to me what configuration/ spec file needs to be used with this trainable model when running inference with this model/ or to use them as pre-trained model.

  • Can you please specify which TAO model and specification file need to be used with new or old trainable object detection models released under NGC. And also point to TAO model under TAO documentation (Efficient-det or DINO)? Are they are TF or pytorch models?
  • And which TAO version we should use with latest released version ( TAO 5.2 as specified in documentation or is it out of date as the latest releases are after that?)
1 Like

Please refer to notebook tao_tutorials/notebooks/tao_launcher_starter_kit/retail_object_detection/retail_object_detection.ipynb at main · NVIDIA/tao_tutorials · GitHub to do finetuning.
The specs files can be found it that folder as well. It will use DINO network. The DINO network locates at TAO pytorch docker.
The latest TAO 5.5 doc is in DINO - NVIDIA Docs.
For inference with TAO, you can refer to the notebook or TAO user guide.
For inference in deepstream, you can refer to deepstream_tao_apps/configs/nvinfer/retail_object_detection_tao at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub and GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream.

Thanks for your quick response.

I was looking at the tao_tutorials/notebooks/tao_launcher_starter_kit/retail_object_detection/retail_object_detection.ipynb at main · NVIDIA/tao_tutorials · GitHub.
And it seems like Tutorials for TAO 5.5 relase and TAO 5.3 release both uses/download trainable_binary_v2.1.1 model.

However the latest model is trainable_retail_object_detection_binary_v2.2.2.3. Is the specs same for this model too? and can we use this model with TAO 5.3 or TAO 5.5?

Both trainable_binary_v2.1.1 and trainable_retail_object_detection_binary_v2.2.2.3 can be used to do finetune training.
BTW, v1.0 and v1.1 are using EfficientDet. All other versions are using DINO.

Ok thanks. Couple of more questions:

  1. The tutorial you pointed at is referring to trainable_binary_v2.1.1. Is the specs file the same for trainable_retail_object_detection_binary_v2.2.2.3?

  2. And what are the difference between v2.1 and v2.2? Is it just the amount of training data used? And/Or are some training parameters different?

Yes, you can use the same spec file tao_tutorials/notebooks/tao_launcher_starter_kit/retail_object_detection/specs/train.yaml at main · NVIDIA/tao_tutorials · GitHub. But need to change the pretrained_model_path.

We are updating the model card. But it is not public yet. Please refer to below.

Thanks, that is great information. So the model is larger and you have used more real training data.
It will definitely be great to have those details along each trainable model file.

Is it possible for us to download the training data you have used to train the latest model ? (2211 real images and 226k synthetic data)

These are the internal dataset. They are not public.

Thanks. Can you please clarify if DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection is the paper corresponding to the DINO model implantation?

And what is the objective/ use case of distill command for the DINO Model?

Based on the table you have provided above, the backbone for model trainable_retail_object_detection_binary_v2.2.2.3 should be fan_base. However the spec file you have linked above uses fan_small.

  1. So should we change the backbone to fan_base when using trainable_retail_object_detection_binary_v2.2.2.3 as pre-trained model?

  2. Is there any other configurations that needs to be different for the trainable_retail_object_detection_binary_v2.2.2.3 (are there are more outdated specs that need to be fixed)?

Yes, you can.

No others needs to be different.

Yes, [2203.03605] DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.

You can refer to distillation notebook under tao_tutorials/notebooks/tao_launcher_starter_kit/dino at main · NVIDIA/tao_tutorials · GitHub.

I am getting following error when trying to train the model in TAO 5.5.
Its looking for this configuration cudnn.benchmark = cfg["train"]["cudnn"]["benchmark"] , but I cant find any such configuration in TAO DINO documentation

 tao model dino train \
-e  /workspace/tao-experiments/specs/train.yml
2024-11-22 03:25:19,278 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-11-22 03:25:19,368 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt
2024-11-22 03:25:19,382 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
[2024-11-22 03:25:27,199 - TAO Toolkit - matplotlib.font_manager - INFO] generated new fontManager
/usr/local/lib/python3.10/dist-packages/hydra/plugins/config_source.py:124: UserWarning: Support for .yml files is deprecated. Use .yaml extension for Hydra config files
  deprecation_warning(
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/loggers/api_logging.py:236: UserWarning: Log file already exists at /workspace/tao-experiments/results/trainings/training1/status.json
  rank_zero_warn(
Seed set to 1234
Train results will be saved at: /workspace/tao-experiments/results/trainings/training1
Error executing job with overrides: []Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/decorators/workflow.py", line 69, in _func
    raise e
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/decorators/workflow.py", line 48, in _func
    runner(cfg, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/dino/scripts/train.py", line 146, in main
    run_experiment(experiment_config=cfg,
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/dino/scripts/train.py", line 36, in run_experiment
    results_dir, resume_ckpt, gpus, ptl_loggers = initialize_train_experiment(experiment_config, key)
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/initialize_experiments.py", line 56, in initialize_train_experiment
cudnn.benchmark = cfg["train"]["cudnn"]["benchmark"]omegaconf.errors.ConfigKeyError: Key 'cudnn' is not in struct
    full_key: train.cudnn
    object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[2024-11-22 03:25:35,916 - TAO Toolkit - root - INFO] Sending telemetry data.
[2024-11-22 03:25:35,916 - TAO Toolkit - root - INFO] ================> Start Reporting Telemetry <================
[2024-11-22 03:25:35,916 - TAO Toolkit - root - INFO] Sending {'version': '5.5.0', 'action': 'train', 'network': 'dino', 'gpu': ['Tesla-V100-SXM2-16GB'], 'success': False, 'time_lapsed': 8} to https://api.tao.ngc.nvidia.com.
[2024-11-22 03:25:37,147 - TAO Toolkit - root - INFO] Telemetry sent successfully.
[2024-11-22 03:25:37,148 - TAO Toolkit - root - INFO] ================> End Reporting Telemetry <================
[2024-11-22 03:25:37,148 - TAO Toolkit - root - WARNING] Execution status: FAIL
2024-11-22 03:25:38,297 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

And following is the configuration file:

train:
  freeze: ['backbone', 'transformer.encoder']
  pretrained_model_path: /workspace/tao-experiments/models/retail_object_detection_vtrainable_retail_object_detection_binary_v2.2.2.3/dino_model_epoch011.pth
  num_gpus: 1
  num_nodes: 1
  validation_interval: 1
  checkpoint_interval: 1
  seed: 1234
  results_dir: /workspace/tao-experiments/results/trainings/training1
  optim:
    lr_backbone: 1e-6
    lr: 1e-5
    lr_steps: [11]
    momentum: 0.9
  num_epochs: 12
dataset:
  train_data_sources:
    - image_dir: /workspace/tao-experiments/data/dataset_2024-22-11T0942_1732228936/train
      json_file: /workspace/tao-experiments/data/dataset_2024-22-11T0942_1732228936/annotations/instances_train.json
  val_data_sources:
    - image_dir: /workspace/tao-experiments/data/dataset_2024-22-11T0942_1732228936/test
      json_file: /workspace/tao-experiments/data/dataset_2024-22-11T0942_1732228936/annotations/instances_test.json
  num_classes: 2
  batch_size: 4
  workers: 8
  augmentation:
    fixed_padding: False
model:
  backbone: fan_base
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 900
  num_select: 100
  dropout_ratio: 0.0
  dim_feedforward: 2048
results_dir: /workspace/tao-experiments/results/trainings/training1
encryption_key: nvidia_tao

Based on the pytorch repo, it seems its looking for other configurations such as cfg["train"]["cudnn"]["deterministic"], cfg["train"]["cudnn"]["benchmark"] which are not defined in documentation.

  1. Can you please explain why I am getting this errors? (dont they have default values specified).
  2. And if I am suppose to specify values, can you let me know the values for the above two configurations? Thanks.

Could you please create a new topic for your latest questions? Thanks a lot.

yeah no worries. created new thread: Fine Tuning DINO Retail Object detector - error out as it expects unspecified/unknown configurations

OK, let us track in that topic and close this one since it is solved.