Training and inference Video resolutions


I’m working on Jetson Nano with deepstream-6.0 and handling a 3264x2464 RTSP input.
I’ve trained a detectnet_v2 resnet_18 model using the tao_toolkit.
What resolutions should I use for training and inference? Is it necessary to resize all the images to 960x544 during training?
And when running inference, can I provide the input as 3264x2464 and expect it to be automatically resized?

Thanks in advance

The training resolution and inferencing resolution is decided by the model input layer resolution.

The larger the model input resolution, the larger the model size.

DeepStream SDK is only an inferencing framework. Since you have trained the models, you should have known about the model input resolution.

For inferencing with DeepStream SDK, you can use gst-nvinfer to deploy your trained model. The resize, format conversion,… are done inside DeepStream, the only thing you have to do is to fill the configurations with proper parameters. Please refer to DeepStream samples. C/C++ Sample Apps Source Details — DeepStream 6.2 Release documentation

DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

1 Like

Do you think it would be effective to train the TrafficamNet model using images captured from a height of 12 feet in order to detect cars and pedestrians?
Or should i train another architecture from scratch ?

The dataset you choose for training will impact the model precision. I think there are already some discussion of how to choose training dataset. Introduction Training Datasets for Machine learning [A to Z] | Encord

According to our experience, if your model will be used to inference the pictures captured from a 12 feet height, it is better to introduce such images in your training dataset. TAO toolkit have provided pre-trained TrafficamNet model, you can retain the model with your own dataset. Overview - NVIDIA Docs

Thanks !

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.