Hi,
I have questions about input image resizing in TAO.
1)What operation is done to resize input images?
2) Is this operation performed on CPU or GPU?
3)In spec files, there are some suggestions for input image shapes. What are the reasons and concerns for these suggestions? Model accuracy? speed? or network design limitations?
4)Does resizing keep aspect ratio?
5) Are similar operation performed for image preprocessing during validation and inference?
For above questions, may I know which network will you focus on?
Object detections: yolov3, yolov4, tiny yolov4, ssd, faster rcnn, retinanet
Segmentations: mask rcnn, unet
Hi. I have another question:
The online augmentation includes resizing the images or the images are cropped or pads with zeros?
You can check Augmentation Config
for the online data augmentation. Usually the augmentation module provides some basic pre-processing and augmentation when training.
For example, in SSD/DSSD/retinanet,
The augmentation_config
parameter defines the image size after preprocessing. The augmentation methods in the SSD paper will be performed during training, including random flip, zoom-in, zoom-out and color jittering. And the augmented images will be resized to the output shape defined in augmentation_config
. In evaluation process, only the resize will be performed.
Thank you so much.
please help with questions 2 and 3 as well.
The augmentation performs on GPU.
The “Input size requirement” is due to requirement of the network. For example, for yolov4 network, the input image resolution needs a resolution of multiples of 32.
Thank you. I understand network requirements. I mean the default sizes written in the spec files. For example, this config is used for yolov4 spec file:
output_width: 1248
output_height: 384
It depends on your dataset. In jupyter notebook, it will train public KITTI dataset. This dataset contains 1248x384 images. So, the spec file set to 1248 and 384.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.