Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc)
Ubuntu, x86, RTX3090
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
I’m using TAO to retraining my custom model based on detectnet-v2 (resnet18). Context 1:
My private dataset original images are varies, different ratio and resolution, I resized them all to resolution 800x608 by a image resize tool for compatible with the requirement of training in TAO. Question 1:
the image resize tool are ratio based not crop, thus an image (or objects in it) could be distorted, does this impact the later inference? or I mis-understanding anything?
After export the model for start inference in deepstream6, I prepared a 1920x1080 video file in local, also noticed there’s an parameter: input-dims(channel; height; width; input-order All integers, ≥0) in pgie-config-file, I put value 3;1080;1920;0 and ran the app, by my eye, can see the accuracy is pretty bad as many False positive bounding boxes(the box report target object in a actual empty area) were showing, but if I change the value to 3;608;800;0, then the accuracy is much better. Question 2:
What and when I should change the value for parameter: input-dims as the inference source resolution could be varies(from different camera)?
I even noticed, for a same inference source(like a rtsp stream), I keep the ratio but input different width and height with scale into input-dims, also can cause huge different detection accuracy.
The varies resolution of original images are all resized (keep ratio) to 800x608 firstly and then put into image_2 and label_2, and the traning validation will be against on these resized images as well, correct? and I can see the mAP is good both in training and re-training(pruned) stage, below is the training mAP:
Validation cost: 0.000132
Mean average_precision (in %): 85.1225
class name average precision (in %)
Since my inference camera video source resolution is a fix value (now is 1280x960), does this imply I can adjust my training dataset with all resize to 1280x960 as well, and could be helpful for improve inference detection accuracy?