Clarification about ReIdentificatioNet pre-training

Hi,

For ReIdentificationNet, a pre-trained model is available in the NGC catalog trainable_v1.1: resnet50_market1501_aicity156.tlt . I can’t find many information about the pre-trained process though.

In this great blogpost https://developer.nvidia.com/blog/enhance-multi-camera-tracking-accuracy-by-fine-tuning-ai-models-with-synthetic-data/, they are saying the network has been pre-trained using SOLIDER technic and on a dataset which:

includes a combination of NVIDIA proprietary datasets along with Open Images V5.

However in the NGC page, they are only mentioning market-1501 + synthetic IDs. Therefore, I am wondering on which dataset has the model been pre-trained? is it on a combination of NVIDIA proprietary datasets along with Open Images or is it on market-1501 + synthetic data?

Also the blog mention, fine-tuning using 4470 real IDs, does that mean the model tested here is different from deployable_v1.2 for NGC?

Thank you

The deployable_v1.2 is trained with 14737 images from Market-1501 dataset of 751 real people and 29533 images of 156 people (148 of which are synthetic) from MTMC people tracking dataset from the 2023 AI City Challenge.

The training dataset is mentioned in ngc model page. Refer to ReIdentificationNet | NVIDIA NGC.

Thanks for the answer. However it is still not clear to me. Your two replies are mentioning the same information

Therefore:

  • is deployable_v1.2 fine-tune from trainable_v1.1? or is it the export version?
  • If they are the same, what is the pre-trained model used to train trainable_v1.1?
  • what about the the pre-trained model describe in the blog?

Yes, it is the export version. The models in model card are trained on Market1501 + synthetic datasets.

The pretrained-model mentioned in the blog or mentioned in notebook tao_tutorials/notebooks/tao_launcher_starter_kit/re_identification_net/reidentificationnet_swin.ipynb at main · NVIDIA/tao_tutorials · GitHub is trained on unlabeled datasets. We train it on ~3M image unlabeled crops of people objects. Reidentificationnet_transformer is the network trained on combination of NVIDIA proprietary datasets along with Open Images V5 (~3M images).

Ok thanks for the reply.

I have another question regarding the training output displayed
For the training, I have :

Subset   │   # IDs │   # Images │   
╞══════════╪═════════╪
│ Train    │     162 │      41712

and I have configured:

num_classes: 162
batch_size: 128
val_batch_size: 64
num_workers: 4
num_instances: 4

Therefore, we should have 41712/128 = 236 batches for each epoch. I am wondering why TAO is showing 570?
And why are the epochs split into 2? thought this was due to num_workers but even setting it up to 1 shows a split epochs output.

Thanks

Should be related to validation dataset. It contains two parts. One is training part, another is validation part.