Retraining with imbalanced dataset

music1913 · January 4, 2022, 4:28am

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
Ubuntu 20.04.3 LTS, Intel x64, RTX3090.
• Network Type
Detectnet_v2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
tao info :

Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.21.11
published_date: 11/08/2021

• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I’m using TAO for transfer learning for detect several classes of objects in my scenario:

people
bicycle
custom door sticker

I believe the people and bicycle can be retrieved from public with sufficient amount, but the custom door sticker is collected by myself and would be very limited (say 1000 pictures with 1 target in each).

so my question:

Will imbalanced dataset impact precison in transfer liearning in TAO?
Does TAO has build-in functions to help extract part of data from public big dataset to align with custom small dataset?
Say I have the PASCAL VOC dataset with 20 classes (or any other well known dataset), I want only extract people and bicycle from it, further more, for these 2 classes, only 1000 samples of each are extracted, by combined with my custom small dataset, to finaly form a balance dataset for re-training.

thanks.

Morganh · January 4, 2022, 6:22am

Yes, for imbalanced dataset, in detectnet_v2 network, refer to Frequently Asked Questions - NVIDIA Docs

Distribute the dataset class: How do I balance the weight between classes if the dataset has significantly higher samples for one class versus another?

To account for imbalance, increase the class_weight for classes with fewer samples. You can also try disabling enable_autoweighting; in this case initial_weight is used to control cov/regression weighting. It is important to keep the number of samples of different classes balanced, which helps improve mAP.

Yes, actually when you run "tao detectnet_v2 dataset_convert ", it will generate some tfrecords files. You can select part of them to combine with your custom small dataset.

music1913 · January 4, 2022, 6:36am

still a bit confused
is the class_weight introduced for imbalance dataset scenario? if yes, what is the guidline to set its value, like for my case, I only got 1000 samples for private data, compare to public dataset, it almost nothing.
enable_autoweighting will help for my case(very imbalance)? or I have to manually balance the data before training.
tfrecords are binary data, how can I know a file is for which class? then I can only need pick out
people and bicycle from 20 classes.

Morganh · January 4, 2022, 7:01am

Yes, increase the class_weight for classes with fewer samples. The enable_autoweighting cannot help for very imbalance cases.
You can inspect each tfrecord file. You can refer to Tensor reshape error when evaluating a Detectnet_v2 model - #7 by Morganh

system · January 18, 2022, 7:01am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to set enable_autoweighting in training and retraing spec TAO Toolkit	2	778	February 25, 2022
What is enable_autoweighting parameter in the training spec file for detectnet-v2 TAO Toolkit	3	596	October 12, 2021
Mix propriertary and public dataset for retrain TAO Toolkit	34	1153	March 10, 2022
Imbalanced dataset and class_weight hyperparameter detectnet_v2 model TAO Toolkit	5	634	May 17, 2022
Tao Training Detectnet_v2 custom dataset : Average precision value 0.0000% TAO Toolkit	5	212	June 25, 2024
Training acc is too low than expected: Peoplenet on custom dataset TAO Toolkit	14	531	November 15, 2022
Tao toolkit training yolov4 model, YoloV3Datasetconfig has no field named "class_weighting_config" error TAO Toolkit yolo , ai-training	8	639	September 5, 2022
Transfer learning not working - yolov3 - tao toolkit TAO Toolkit deep-learning , tao , deepstream	3	596	July 6, 2022
[DetectNet_v2] mAP 0% with custom dataset after full training – TAO Toolkit 5.5.0 TAO Toolkit	31	60	June 16, 2025
TAO Classification provides low precision with VehicleTypeNet pretrained model TAO Toolkit	2	395	October 13, 2022

Retraining with imbalanced dataset

Related topics