Used the pascalvoc dataset to train with detectnet_V2, but the accuracy is low

jon.tem · May 30, 2022, 8:22am

I trained with Detectnet_V2(resnet18) using the pascalvoc2012 dataset, but the accuracy is low.
Is there any way to improve the accuracy?

The results of the train are as follows

Validation cost: 0.000081
Mean average_precision (in %): 9.7308

class name      average precision (in %)
------------  --------------------------
aeroplane                     27.6491
bicycle                        4.44473
bird                           0.0382677
boat                           0.357756
bottle                         2.17242
bus                           32.6177
car                           10.8047
cat                           30.4376
chair                          1.27091
cow                            0.566813
diningtable                    1.32209
dog                           17.4295
horse                          1.16366
motorbike                     14.0428
person                        35.1342
pottedplant                    0
sheep                          2.62056
sofa                           4.63491
train                          7.70868
tvmonitor                      0.199589

Median Inference Time: 0.008577
2022-05-30 07:16:02,292 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 0.741
2022-05-30 07:16:03,093 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 0.741
Time taken to run iva.detectnet_v2.scripts.train:main: 3:03:38.935997.

• Training spec file
detectnet_v2_train_resnet18_voc.txt (23.4 KB)

• TLT Version
tlt-streamanalytics:v2.0_py3

• Network Type
Detectnet_v2(resnet18)

Morganh · May 30, 2022, 3:03pm

output_image_width: 512
output_image_height: 400

Did you resize the training images/labels to 512x400?

jon.tem · May 30, 2022, 11:26pm

No, I am not resizing.

The document says this

If the output image height and the output image width of the preprocessing block doesn’t match with the dimensions of the input image, the dataloader either pads with zeros, or crops to fit to the output resolution. It does not resize the input images and labels to fit.

If the input image is not sized correctly, you would crop it or either pads with zeros, right?
Therefore, we did not think it was necessary to necessarily resize all images in advance.

I apologize if my understanding is incorrect

Morganh · May 31, 2022, 12:51am

For detectnet_v2 network, it is needed to resize images/labels offline.

see DetectNet_v2 - NVIDIA Docs,

The train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.

jon.tem · May 31, 2022, 1:40am

I’ll resize it to the same resolution in advance and try the train.

Do I need to resize the validation dataset in the same way?

Morganh · May 31, 2022, 1:45am

Yes, it is needed.

jon.tem · June 6, 2022, 4:11am

I trained with images resized to 496x400 in pascalvoc dataset in advance, but the accuracy is still low.

Validation cost: 0.000105
Mean average_precision (in %): 21.7120

class name      average precision (in %)
------------  --------------------------
aeroplane                       40.2799
bicycle                          1.86966
bird                            26.0981
boat                            17.6851
bottle                           4.11341
bus                             52.4331
car                             10.6682
cat                             33.5806
chair                           18.1309
cow                              9.06108
diningtable                      6.90725
dog                             40.0285
horse                           33.3164
motorbike                       14.472
person                          44.0251
pottedplant                     11.3704
sheep                            3.42361
sofa                            13.4291
train                           53.3468
tvmonitor                        0

Median Inference Time: 0.005719
2022-06-01 08:29:52,196 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 1.680
2022-06-01 08:29:53,013 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 1.680
Time taken to run iva.detectnet_v2.scripts.train:main: 2:54:01.435409.

• Training spec file
detectnet_v2_train_resnet18_voc.txt (23.4 KB)

Is there anything I can do to improve accuracy?

Morganh · June 6, 2022, 6:19am

If resized to 496x400, please check the resolution of each objects. Detectnet_v2 may not be able to detect objects that are smaller than 16x16 pixels.
See Frequently Asked Questions - NVIDIA Docs

Following parameters can help you improve AP on smaller objects:

Increase num_layers of resnet

class_weight for small objects

Increase the coverage_radius_x and coverage_radius_y parameters of the bbox_rasterizer_config section for the small objects class

Decrease minimum_detection_ground_truth_overlap

Lower minimum_height to cover more small objects for evaluation.

Please use yolo_v4_tiny network instead.

jon.tem · June 6, 2022, 7:45am

Thank you very much.
I will try 1 and 2 first.
By the way, do you have any experience with high accuracy when training pascalvoc with Detectnet_V2?

Morganh · June 6, 2022, 7:58am

No, we did not have a baseline for Pascalvoc with detectnet_v2.

jon.tem · June 6, 2022, 8:48am

Thank you very much.
Let me ask one more question.

In tvmonitor class, AP is still 0. What could be the cause?
I checked the area of the bbox and it is about 130px on average for both width and height.
There are also 412 labels in the training.

Morganh · June 6, 2022, 10:08am

Could you try to setup another experiment? Just to train one class: tvmonitor.

jon.tem · June 7, 2022, 2:41am

When we trained only the tvmonitor class, the accuracy increased.

Validation cost: 0.000035
Mean average_precision (in %): 34.6166

class name      average precision (in %)
------------  --------------------------
tvmonitor                        34.6166

Median Inference Time: 0.003489
2022-06-07 02:00:03,813 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 6.655
2022-06-07 02:00:04,613 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 6.655
Time taken to run iva.detectnet_v2.scripts.train:main: 1:46:26.249857.

Morganh · June 7, 2022, 3:10am

Besides above suggestions, please consider more for the experiments.

Try to use larger backbone. For example, resnet50, vgg19, etc.
Set minimum_bounding_box_height to 1
Set all the minimum_height and minimum_width to 20
Set all the classes to the same

            name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0

VOC is an imbalance dataset, see Frequently Asked Questions - NVIDIA Docs
Distribute the dataset class: How do I balance the weight between classes if the dataset has significantly higher samples for one class versus another?

To account for imbalance, increase the class_weight for classes with fewer samples. You can also try disabling enable_autoweighting; in this case initial_weight is used to control cov/regression weighting. It is important to keep the number of samples of different classes balanced, which helps improve mAP.

Try to finetune batch-size. For example, 8, 4.
Try to finetune learning rate. For example, max_lr: 1.25e-4 min_lr=1.25e-5

yingliu · July 6, 2022, 6:37am

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

system · July 20, 2022, 6:38am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Very low precision while Training detectnet_v2 model using custom data in TAO TAO Toolkit	13	1058	May 4, 2023
0.0 average precision during a detectnet_v2 training TAO Toolkit	10	493	September 28, 2023
Mean average precision too low on dimension (640*480) with (detectnetv2+Resnet18)? TAO Toolkit	2	726	October 12, 2021
Error detectnet_V2 train with TAO : dbscan_min_samples: 0.05' TAO Toolkit tao	4	388	November 7, 2023
Detectnet_v2(resnet50) low accuracy on 2 class dataset TAO Toolkit	25	918	February 12, 2023
Detectnet_v2 tlt ( training to detect person) TAO Toolkit	12	703	October 12, 2021
Help with Detectnet_V2 train config file (TAO) Computer Vision & Image Processing tao	2	835	December 26, 2024
Detectnet_v2 acuity is low TAO Toolkit	19	339	July 18, 2023
training on small objects TAO Toolkit	2	505	October 12, 2021
Relationship between training dataset size and inference data size TAO Toolkit	12	692	February 22, 2022

Used the pascalvoc dataset to train with detectnet_V2, but the accuracy is low

Related topics