What is the optimal width and height of the images for the following models?
object detections: yolo_v3, yolo_v4, retinanet, ssd and faster_rcnn
segmentations: unet and mask_rcnn
classifications
Do you mean the resolution of the training images or the input_size of the model you want to train?
please tell me about both of them
For input_size of the model
Faster_rcnn : see FasterRCNN — TAO Toolkit 3.0 documentation
 Input size : C * W * H (where C = 1 or 3, W >= 128, H >= 128)
yolo_v3 or yolo_v4: see YOLOv3 — TAO Toolkit 3.0 documentation
 Input size : C * W * H (where C = 1 or 3, W >= 128, H >= 128, W, H are multiples of 32)
ssd: see SSD — TAO Toolkit 3.0 documentation
 Input size : C * W * H (where C = 1 or 3, W >= 128, H >= 128)
retinanet: see RetinaNet — TAO Toolkit 3.0 documentation
 Input size : C * W * H (where C = 1 or 3, W >= 128, H >= 128, W, H are multiples of 32)
Mask_rcnn: see MaskRCNN — TAO Toolkit 3.0 documentation

Input size : C * W * H (where C = 3, W >= 128, H >= 128 and W, H are multiples of 2^
max_level
)
Unet: see UNET — TAO Toolkit 3.0 documentation
 Input size : C * W * H (where C = 3 or 1, W = 572, H = 572 for vanilla unet and W >= 128, H >= 128 and W, H are multiples of 32 for other archs).
For resolution of the training image
Faster_rcnn: see FasterRCNN — TAO Toolkit 3.0 documentation , with static input shape, we can offline resize the images to the target resolution or we can enable automatic resize during training.
yolo_v3, yolo_v4, retinanet, ssd : do not need to resize images/labels. There is automatic resizing during training.
Mask_rcnn or Unet: The images and masks need not be equal to model input size. The images/ masks will be resized to the model input size during training.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.