If the output image height and the output image width of the preprocessing block doesn’t match with the dimensions of the input image, the dataloader either pads with zeros, or crops to fit to the output resolution. It does not resize the input images and labels to fit.
If the input image is not sized correctly, you would crop it or either pads with zeros, right?
Therefore, we did not think it was necessary to necessarily resize all images in advance.
The train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.
Thank you very much.
Let me ask one more question.
In tvmonitor class, AP is still 0. What could be the cause?
I checked the area of the bbox and it is about 130px on average for both width and height.
There are also 412 labels in the training.
When we trained only the tvmonitor class, the accuracy increased.
Validation cost: 0.000035
Mean average_precision (in %): 34.6166
class name average precision (in %)
------------ --------------------------
tvmonitor 34.6166
Median Inference Time: 0.003489
2022-06-07 02:00:03,813 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 6.655
2022-06-07 02:00:04,613 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 6.655
Time taken to run iva.detectnet_v2.scripts.train:main: 1:46:26.249857.
VOC is an imbalance dataset, see Frequently Asked Questions - NVIDIA Docs Distribute the dataset class: How do I balance the weight between classes if the dataset has significantly higher samples for one class versus another?
To account for imbalance, increase the class_weight for classes with fewer samples. You can also try disabling enable_autoweighting; in this case initial_weight is used to control cov/regression weighting. It is important to keep the number of samples of different classes balanced, which helps improve mAP.
Try to finetune batch-size. For example, 8, 4.
Try to finetune learning rate. For example, max_lr: 1.25e-4 min_lr=1.25e-5
There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks