How to train TAO Toolkit models on COCO Dataset?

While performing transfer learning of PeopleNet model () on COCO dataset using TAO Toolkit, several training performance-related issues were encountered. The issues are summarised below.

*Issue Summary
Unable to improve person and bag detection through transfer learning. Baseline performance was < 10% mAP before training and after training, no improvements were seen. In some cases, a degradation of performance was encountered (0% mAP).

Approaches used:

  • using a balanced dataset with same number of labels for each class ~12,000
  • using a range of initial learning rates from 5e-6 to 0.1
  • removed all other classes from COCO dataset and only retaining person and bag classes
  • training with various resolutions - (960, 544), (1280, 720)
  • with and without freezing layers
  • using ground truth bounding box labels that are tightly cropped to the objects


  • Hardware (T4/V100/Xavier/Nano/etc): NVIDIA GeForce RTX 3070 Ti, T4
  • Network Type - Detectnet_v2
  • TLT Version - 4.0.0
  • Training spec file - refer to attachment
  • How to reproduce the issue - refer to command below and attached logs
    training_spec.txt (4.6 KB)
    logs.txt (950.7 KB)
tao detectnet_v2 train -e /tlt_exp/experiments/training_spec.txt -r /datasets/experiments/ -n "peoplenet_model" -k "tlt_encode" --gpus 1

Since the original images in COCO dataset have different kinds of resolutions, it is needed to set enable_auto_resize: true in the spec file. Please set it and train again.
More, since the COCO images have various background or context, it may be quite different from the dataset mentioned in peoplenet model card PeopleNet | NVIDIA NGC . Thus, degradation of performance comparing to the model card may happen.

@Morganh Hi, we have already resized the images to the same resolution and adjusted the labels prior to loading them in TAO Toolkit.

This was done by setting a fixed resolution. For example, 1280 x 720, then scaling the image to the closest size, while maintaining the aspect ratio, then padding the image to achieve target resolution.

OK, could you share several images and their labels?

@Morganh Hi, here are two sample images from the dataset, with their bounding boxes (373.6 KB)

Could you please use the original images/labels and then use set enable_auto_resize: true to train the two classes again?

@Morganh Hi, we have tried the proposed solution with no changes in training performance.

We understand that COCO images are different from the dataset used to train Peoplenet and hence degradation of performance is possible.

If there is any other training configuration we can try, please let us know. Otherwise, we will consider this query closed.

Thank you.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

For your existing result, is the mAP still near 0 from tao evaluation? Could you run tao inference to double check?

More, you can also train from scratch against your training dataset. That means training without the peoplenet pretrained model. Usually, mAP 0 is not expected. For input size, you can consider using average resolution of the training dataset. For example, 512x512.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.