Clarification about ReIdentificatioNet pre-training

Hi,

For ReIdentificationNet, a pre-trained model is available in the NGC catalog trainable_v1.1: resnet50_market1501_aicity156.tlt . I can’t find many information about the pre-trained process though.

In this great blogpost https://developer.nvidia.com/blog/enhance-multi-camera-tracking-accuracy-by-fine-tuning-ai-models-with-synthetic-data/, they are saying the network has been pre-trained using SOLIDER technic and on a dataset which:

includes a combination of NVIDIA proprietary datasets along with Open Images V5.

However in the NGC page, they are only mentioning market-1501 + synthetic IDs. Therefore, I am wondering on which dataset has the model been pre-trained? is it on a combination of NVIDIA proprietary datasets along with Open Images or is it on market-1501 + synthetic data?

Also the blog mention, fine-tuning using 4470 real IDs, does that mean the model tested here is different from deployable_v1.2 for NGC?

Thank you

The deployable_v1.2 is trained with 14737 images from Market-1501 dataset of 751 real people and 29533 images of 156 people (148 of which are synthetic) from MTMC people tracking dataset from the 2023 AI City Challenge.

The training dataset is mentioned in ngc model page. Refer to ReIdentificationNet | NVIDIA NGC.

Thanks for the answer. However it is still not clear to me. Your two replies are mentioning the same information

Therefore:

  • is deployable_v1.2 fine-tune from trainable_v1.1? or is it the export version?
  • If they are the same, what is the pre-trained model used to train trainable_v1.1?
  • what about the the pre-trained model describe in the blog?

Yes, it is the export version. The models in model card are trained on Market1501 + synthetic datasets.

The pretrained-model mentioned in the blog or mentioned in notebook tao_tutorials/notebooks/tao_launcher_starter_kit/re_identification_net/reidentificationnet_swin.ipynb at main · NVIDIA/tao_tutorials · GitHub is trained on unlabeled datasets. We train it on ~3M image unlabeled crops of people objects. Reidentificationnet_transformer is the network trained on combination of NVIDIA proprietary datasets along with Open Images V5 (~3M images).

Ok thanks for the reply.

I have another question regarding the training output displayed
For the training, I have :

Subset   │   # IDs │   # Images │   
╞══════════╪═════════╪
│ Train    │     162 │      41712

and I have configured:

num_classes: 162
batch_size: 128
val_batch_size: 64
num_workers: 4
num_instances: 4

Therefore, we should have 41712/128 = 236 batches for each epoch. I am wondering why TAO is showing 570?
And why are the epochs split into 2? thought this was due to num_workers but even setting it up to 1 shows a split epochs output.

Thanks

Should be related to validation dataset. It contains two parts. One is training part, another is validation part.

ok starting to get there but I still have unclear numbers:

I have a

  • 41712 train images, divided by 128 batch size → 326 steps.
  • 13434+2349 valid images (query+ gallery), divided by 64 batch size → 246 steps.

So when getting the logs I should see:

  • Training loop in progress: 326/570
  • Train and Val metrics generated: 246/570

Which is not the case here

(I’ve wrote a typo in my previous message above and 41712/128=326)

Subset      # Ids    # Images
------------------------------
Train       162      41712
Query       67       2349
Gallery     67       13434

Thanks

Train: round(41712/128) = 325
Query: round(2349/64) = 36
Gallery: round(13434/64) = 209

Totally, 325 + 36 + 209 = 570. It matches the log 570.

thanks but I am not saying it doesn’t match the logs anymore regarding 570.
But the training is saying ~270/570 while it should be 325/570

OK. If possible, please check if you meet similar behavior by running the dataset mentioned in notebook tao_tutorials/notebooks/tao_launcher_starter_kit/re_identification_net at main · NVIDIA/tao_tutorials · GitHub.

Using the starter kit with batch_size=64 and val_batch_size=128

Subset      # Ids    # Images
------------------------------
Train       100      1583
Query       100       445
Gallery     100       1756
  • Train: round(1583/64) = 25
  • Query: round(445/128) = 4
  • Gallery: round(1756/128) = 14

Totally, 25 + 4 +14 = 43

Looks like logs are showing something different. If above 570 was the total train+test here we should get 25/43 and 18/43 but instead the bar is going to 95% and it seems like the total number is /train or /test not /(test+train).
Why is it different depending on the training?

Epoch 0:  95%|███████████████████████████████████████████████████████████████████████████████████████████████████████████▊     | 21/22 [00:05<00:00,  3.79it/s, loss=5.37, v_num=0]Train and Val metrics generated.
Epoch 0:  95%|███████████████████████████████████████████████████████▎  | 21/22 [00:05<00:00,  3.76it/s, loss=5.37, v_num=0, train_loss=5.510, base_lr=3.81e-5, train_acc_1=0.0134]Training loop in progress
Epoch 1:  95%|███████████████████████████████████████████████████████▎  | 21/22 [00:03<00:00,  5.34it/s, loss=5.13, v_num=0, train_loss=5.510, base_lr=3.81e-5, train_acc_1=0.0134]Train and Val metrics generated.
Epoch 1:  95%|███████████████████████████████████████████████████████▎  | 21/22 [00:03<00:00,  5.30it/s, loss=5.13, v_num=0, train_loss=5.130, base_lr=3.81e-5, train_acc_1=0.0201]Training loop in progress

Thanks for the info. I will try to reproduce on my side.

Great!

I cannot reproduce similar result with 5.5.0 pyt docker. I run with default dataset mentioned in the notebook. I try num_instances: 1 or default num_instances: 4.
Suggest you to add debug code in
tao_pytorch_backend/nvidia_tao_pytorch/cv/re_identification/dataloader/sampler.py at main · NVIDIA/tao_pytorch_backend · GitHub to check total_batches, self.batch_size, self.num_instances, self.num_pids_per_batch and total_full_batches., etc.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.