I have a dataset of images having a resolution of 2208 x 1242 . I am training a TLT YOLO v3 object detection model with mobilenet_v2 as backbone. Initially when i trained it with output_image_width: 1242 and output_image_height: 960 in config file, i got good results. But my requirement is that i need to train with lower resolutions, say 480 x 320. When i am training with lower resolution i am getting 0 mAP. How can i train the model with smaller output_image_width and output_image_height, so that i can give images with smaller resolution as input to the model while inferring ?
yolo_train_mobilenet_v2_kitti.txt (1.9 KB)
This is my config file…
For TLT 2.0, if you want to train a 480x320 model , firstly, please resize the images and labels. The re-generate the anchor shapes according to the new labels.
Apparently your anchor_shape is not correct.
For TLT 3.0, you can directly train via original images/labels without resizing offline.
Hi @Morganh , Thanks…
tlt-streamanalytics:v3.0-dp-py3 is this the docker image name for TLT 3.0 ?
could you please share the steps for pulling it and running the tlt 3.0 docker container?
i was not able to pull with the command docker pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3
and i couldn’t run it with, docker run --runtime=nvidia -it -v /home/<my_user_name>/tlt-experiments:/workspace/tlt-experiments -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3
Please refer to TLT Launcher — Transfer Learning Toolkit 3.0 documentation
Especially note that
In TLT3.0, we have created an abstraction above the container, you will launch all your training jobs from the launcher. No need to manually pull the appropriate container, tlt-launcher will handle that. You may install the launcher using pip with the following commands.
pip3 install nvidia-pyindex pip3 install nvidia-tlt
Hey @Morganh … so the actual image size in the dataset was 2208 x 1242… and i was able to train in on 1472 x 960 in tlt 2.0 itself. without resizing the images in the dataset i was able to get the detections well… so for 480x320 why the images should be resized? could you please elaborate it?
If the output image height and the output image width of the preprocessing block doesn’t match with the dimensions of the input image, the dataloader either pads with zeros, or crops to fit to the output resolution. It does not resize the input images and labels to fit.
Although you were able to train a 1472x960 model, actually the images are cropped. The anchor shapes look ok.
But if you trained a smaller(480x320) model, the anchor shapes are not correct.
Okay… Is there any script to resize all the images in KITTI dataset with labels simultaneously ? @Morganh
Yea resizing and training with those images solved the problem
So @Morganh … i thought fps will increase when the input resolution to the model is reduced. But what i have seen is that, when i integrated my old model which is trained on 1472* 960 with deepstream it gave 30 fps. Now when i integrated new model which is trained on 480 * 320, it’s still gives the same 30 fps. Is there something i can change in the config file to improve the fps for low resolution model?
Can you run trtexec against the trt engine to check if there is difference?
@jazeel.jk @Morganh I was facing a similar issue with FPS results. Turns out you need to set sync=0 instead of sync=1 in your config file. When sync is set to 1, if possible, it will sync to the FPS of the input video, hence the 30FPS.
@ricky.medrano160 Great… That worked… Thanks Ricky…
Not sure what you would change in the config file to limit FPS to a specific amount. Instead you could always reconfigure your video/stream to the desired FPS you want then set sync=1.