Fine Tuning Retail Object Detection Models provided in NGC

I want to Test Retail Object Detection Models provided under NGC in TAO. I want to first use the models to run inference, and then fine tune the models with my custom data. However I am not clear of which configuration/spec files to use with each provided model, if they are EfficientDet or DINO models ( and which TAO version)

The following information id provided under the documentation. However this is about models v1.0. Is the documentation not updated after although new models have been released?

Network Architecture: EfficientDet, DINO-FAN_base

The documentation mentioned to use TAO Efficientdet-TF2 for fine tuning the model and has provided the configuration file for that


`
However description for the latest release is given as

DINO (DETR with Improved DeNoising Anchor Boxes) based object detection network to detect retail objects on a checkout counter.

And the latest released trainable model is tagged as dino_model_epoch=011.pth

So its not clear to me what configuration/ spec file needs to be used with this trainable model when running inference with this model/ or to use them as pre-trained model.

  • Can you please specify which TAO model and specification file need to be used with new or old trainable object detection models released under NGC. And also point to TAO model under TAO documentation (Efficient-det or DINO)? Are they are TF or pytorch models?
  • And which TAO version we should use with latest released version ( TAO 5.2 as specified in documentation or is it out of date as the latest releases are after that?)
1 Like

Please refer to notebook tao_tutorials/notebooks/tao_launcher_starter_kit/retail_object_detection/retail_object_detection.ipynb at main · NVIDIA/tao_tutorials · GitHub to do finetuning.
The specs files can be found it that folder as well. It will use DINO network. The DINO network locates at TAO pytorch docker.
The latest TAO 5.5 doc is in DINO - NVIDIA Docs.
For inference with TAO, you can refer to the notebook or TAO user guide.
For inference in deepstream, you can refer to deepstream_tao_apps/configs/nvinfer/retail_object_detection_tao at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub and GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream.

Thanks for your quick response.

I was looking at the tao_tutorials/notebooks/tao_launcher_starter_kit/retail_object_detection/retail_object_detection.ipynb at main · NVIDIA/tao_tutorials · GitHub.
And it seems like Tutorials for TAO 5.5 relase and TAO 5.3 release both uses/download trainable_binary_v2.1.1 model.

However the latest model is trainable_retail_object_detection_binary_v2.2.2.3. Is the specs same for this model too? and can we use this model with TAO 5.3 or TAO 5.5?

Both trainable_binary_v2.1.1 and trainable_retail_object_detection_binary_v2.2.2.3 can be used to do finetune training.
BTW, v1.0 and v1.1 are using EfficientDet. All other versions are using DINO.

Ok thanks. Couple of more questions:

  1. The tutorial you pointed at is referring to trainable_binary_v2.1.1. Is the specs file the same for trainable_retail_object_detection_binary_v2.2.2.3?

  2. And what are the difference between v2.1 and v2.2? Is it just the amount of training data used? And/Or are some training parameters different?

Yes, you can use the same spec file tao_tutorials/notebooks/tao_launcher_starter_kit/retail_object_detection/specs/train.yaml at main · NVIDIA/tao_tutorials · GitHub. But need to change the pretrained_model_path.

We are updating the model card. But it is not public yet. Please refer to below.

Thanks, that is great information. So the model is larger and you have used more real training data.
It will definitely be great to have those details along each trainable model file.

Is it possible for us to download the training data you have used to train the latest model ? (2211 real images and 226k synthetic data)

These are the internal dataset. They are not public.

Thanks. Can you please clarify if DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection is the paper corresponding to the DINO model implantation?

And what is the objective/ use case of distill command for the DINO Model?