SGIE training images tlt classification

Hello, I am developing a pipeline consisting in a detection model as pgie and classification model as sgie. As I understand it, it should work as follows (please correct me if I am wrong):

The detection model makes a bbox prediction, the object is cropped where the bbox is, and the cropout is fed as input to the classification model.

I have two questions:

  1. My classification model takes images of size 256x144. I want to keep aspect ratio for the cropped bboxes, so I need padding. I read on a post that padding is only at the bottom and the right. Is this correct? Or can the padding be on any border of the image?
  2. I am training the classification model with TLT. There is an option to choose the image size and resize_interpolation_method. I don’t see a flag to pad. Do I need to add padding to the images manually before training? (i.e. make my input images 256x144 with padding so that no resizing happens in training)

Thanks for your help

According to your pipeline, the sgie is a classification model.

In TLT classification network, see Open Model Architectures — Transfer Learning Toolkit 3.0 documentation, Classification input images do not need to be manually resized. The input dataloader resizes images as needed.

And the dataloader will resize while keeping aspect ratio. You need not consider it by yourself.

In TLT classification network, there are “enable_random_crop” and “enable_center_crop” in the training spec. See Image Classification — Transfer Learning Toolkit 3.0 documentation By default, it is False, you can set it to True (BTW, the user guide mentioned that the default value is True. It is wrong, we will modify it. We just want to suggest end user to set it to True) .

For resize_interpolation_method, see Image Classification — Transfer Learning Toolkit 3.0 documentation, you can set it to BILINEAR or BICUBIC.

End user did not need to set any padding.

In a word, if you have different resolution of images, and want to train a 256x144 classification model, just need to set input_image_size to “3,144,256” , and are suggested to set “enable_random_crop” and “enable_center_crop” to True. Then trigger training.

Thank you for the prompt reply.
Ok, understood. I will train with images of random sizes and let the classification network take care of resizing/ padding.

At inference time, should I set maintain-aspect-ratio=1? Will this take care of both resizing and padding as the network at training time?


Which inference method? With deepstream?

Yes with deepstream

Do not need to set maintain-aspect-ratio=1
See Image Classification — Transfer Learning Toolkit 3.0 documentation