We have been using Darknet framework for a while and are thinking on migrating to TLT 3.0 or (TAO now). We are doing an accuracy comparison with our previous YOLOv4 models trained on our person dataset.
In order to compare trainings, framework etc… We have a python script which compute PASCAL2010 mAP. This script has been well tested and gives the exact same mAP than Darknet official repo (darknet detector map command) with the same weights.
We wanted to compute TLT accuracy using our script as well (to make a fair comparison with our previous results, we need to use the same tool).
To do so we first run tlt inference, then use the bounding boxes coordinates generated in the labels folder (KITTI format) and converted them to the Darknet format. Finally compute the Pascal2010 mAP using our script.
In addition we also have computed the mAP using TRT. We have converted TLT file into TRT, run the inferences, scale and converted bounding boxes and finally compute the mAP using our script.
We also get the same score than tlt evaluate
To summarise, the problem is we are unable to get the same mAP by running tlt evaluate or trt inference + script VS tlt inference + script.
Results:
tlt evaluate => mAP@0.5 = 84.5%
trt inference + script => mAP@0.5 = 84.5%
tlt inference + script => mAP@0.5 = 77.8%
To remove possible errors, we have so far confirmed:
mAP computed by our script is the same than Darknet official repo
Predicted bounding boxes conversion from KITTI to Darknet format is correct, we have checked is visually.
Compare tlt evaluate vs trt inference + script => accuracy is equal
Do you have any insights on why we would get such a difference? Is there any post processing done is tlt inference not present in tlt evaluate and trt inference o vice-versa?
You mean tao evaluate I guess right? Yes I did run tao evaluate which gives the same results than tlt evaluate → mAP@0.5 = 84.5%.
But I don’t really understand the point of trying that. As I mentioned in the post, we can confirm tao evaluate gives us the same accuracy than trt inference + mAP script. But not tlt inference which is our concern.
Ok I have run: tao yolo_v4 evaluate -e /workspace/tlt-experiments/trainings/conf/yolov4_config.yml -m /workspace/tlt-experiments/trainings/yolov4_cspdarknet53_fp32.engine -k key
And I get mAP@0.5 = 84.3%, pretty close from the tao evaluate with .tlt (84.5%).
Not really because it is the the company code. But the script has been tested against Darknet official repo and it’s giving the same mAP on the same test set…
I have also re-run the tests on just 10 images with 339 labelled boxes, to remove some possible errors:
tao evaluate .tlt → mAP@0.5 = 66.3%
tao evaluate .engine → mAP@0.5 = 59.2%
trt inference + script → mAP@0.5 = 58.1%
tao inference + script → mAP@0.5 = 54.8%
The two TensorRT accuracies are pretty close but there is still a big difference between tao evaluate and tao inference + script
Can you try to change matching_iou_threshold and execute below experiment again?
tlt evaluate tlt inference + script
I don’t really understand the point of doing that. If I change the matching_iou_threshold to 0.7 for example, I’m not computing mAP@0.5 anymore but mAP@0.7 so they will still be a difference between tlt evaluate and tlt inference + script . What is your thought behind that?
Also is tao evaluate directly calling the inference function() same as tao inference? If so and because we are using the same config for both, is there any processing done in tao evaluate which is not applied in tao inference?
Thank you
Just want to know during different thresholds, how much the difference will be between tlt evaluate and tlt inference+script.
The tao evaluate and tao inference are different applications. When they run inference against .tlt model, there are some differences in images preprocessing and predicted labels postprocessing.
We have trained on our custom person dataset which is a private dataset (we can’t share it).
Yes I can share the training spec file. Do you still want the tlt file even if I won’t share the test set?
Thanks
Ok I did what you ask. At first it didn’t appear to me that would be useful because it is a drawing parameter which should only affect output images and not the saved predictions in the text files. But here are the results:
draw_conf_threshold = 0.3.
tao inference + our script → mAP@0.5 = 77.8%
draw_conf_threshold = 0.2
tao inference + our script → mAP@0.5 = 79.1%
draw_conf_threshold = 0.1
tao inference + our script → mAP@0.5 = 80.7%
draw_conf_threshold = 0.05
tao inference + our script → mAP@0.5 = 81.8%
draw_conf_threshold = 0.01
tao inference + our script → mAP@0.5 = 83.5%
draw_conf_threshold = 0.001
tao inference + our script → mAP@0.5 = 83.9%
It is definitely increasing and I was really surprised! With a low threshold we are getting close to tao evaluate *mAP@0.5 = 84.5%
I’ve also compared the number of predictions in the output .txt files (labels folder). For draw_conf_threshold = 0.3 we get 27712 predictions and for draw_conf_threshold = 0.001 , we get 174423 prediction.
At the end the increase in accuracy makes sense, because when we compute mAP with precision/recall curve we use every predictions we have as points, there is no threshold on confidence used. And especially the mAP with PASCAL 2010 is using every points in the curve. If we give it ~6-7 times less predictions (points) the curve is definitely different.
Questions:
Why a supposedly rendenring parameter, is actually affecting the number of saved predictions? It doesn’t make sense to me at all. It should be a confidence threshold not a drawing threshold.
Was this threshold added to tao during the change from tlt?
What is the confidence threshold used by tao evaluate ? (this one can’t be configured in the config file).
Yes, the “-t” is confidence threshold to draw/label a bbox. It will affect the number of predictions.
No, it is not newly added by tao.
In tao evaluate, it is defined in your training spec file. It is 0.01 by default. See YOLOv4 — TAO Toolkit 3.22.05 documentation
We see now we get the same accuracies with both tao evaluate and tao inference + our script so the problem is solved. I still find it pretty confusing to call a confidence threshold a draw_conf_thres but anyway.
Thanks your help Morgan!