Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc)
Ubuntu, x86, RTX3090
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
Detectnet_v2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
I’m using TAO to retraining my custom model based on detectnet-v2 (resnet18). Context 1:
My private dataset original images are varies, different ratio and resolution, I resized them all to resolution 800x608 by a image resize tool for compatible with the requirement of training in TAO. Question 1:
the image resize tool are ratio based not crop, thus an image (or objects in it) could be distorted, does this impact the later inference? or I mis-understanding anything?
Context 2:
After export the model for start inference in deepstream6, I prepared a 1920x1080 video file in local, also noticed there’s an parameter: input-dims(channel; height; width; input-order All integers, ≥0) in pgie-config-file, I put value 3;1080;1920;0 and ran the app, by my eye, can see the accuracy is pretty bad as many False positive bounding boxes(the box report target object in a actual empty area) were showing, but if I change the value to 3;608;800;0, then the accuracy is much better. Question 2:
What and when I should change the value for parameter: input-dims as the inference source resolution could be varies(from different camera)?
Question 3:
I even noticed, for a same inference source(like a rtsp stream), I keep the ratio but input different width and height with scale into input-dims, also can cause huge different detection accuracy.
The varies resolution of original images are all resized (keep ratio) to 800x608 firstly and then put into image_2 and label_2, and the traning validation will be against on these resized images as well, correct? and I can see the mAP is good both in training and re-training(pruned) stage, below is the training mAP:
Validation cost: 0.000132
Mean average_precision (in %): 85.1225
class name average precision (in %)
door_warning_sign 80.8844
electric_bicycle 80.9867
people 93.4964
since the training and validation are all based on ratio resized images, does this mean the model may learned the distorted objects, correct?
for my scenario, when do the inference, the video sources may have different resolutions, here by hand, a camera is with resolution 1280x960, what is the recommend input-dims values?
Since my inference camera video source resolution is a fix value (now is 1280x960), does this imply I can adjust my training dataset with all resize to 1280x960 as well, and could be helpful for improve inference detection accuracy?