- Hardware Platform (Jetson / GPU) T4 GPU
- DeepStream Version 5.0
- TensorRT Version 7.0
- NVIDIA GPU Driver Version (valid for GPU only) 10.2
- TLT V2.0:
I used TLT to train a people detection model (transfer learnt from PeopleNetV2.0 DetectNetV2 ResNet18).
After training, pruning and retraining with QAT, the model achieved satisfactory performance within the TLT environment, and I then proceeded to export the model and generate a TRT engine file. I performed some inferences using this engine file, and detections were great.
Next, I took this engine file and deployed it in DeepStream, and surprisingly enough, the performance of the model was considerably lower than what I have experienced in TLT! In the attached image, you can see a comparative example, where in the top part I ran inference on a video on DeepStream using the model with a pre-cluster-threshold of 0.5 and 0.4 (a relatively very low threshold to have), and in the bottom part I extracted the video frame and ran inference on this image on TLT using the model with a confidence threshold of 0.9 (a much more logical threshold to have). We can see that with a threshold of 0.9, the model ran in TLT (using
tlt-infer) detected almost all people visible in the frame. However, with a pre-cluster-threshold of 0.5, the model ran in Deepstream didn’t detect any person object! And lowering this pre-cluster-threshold to 0.4, the model detects 3 people.
I also printed the
obj_meta->confidence of the people (with
pre-cluster-threshold set to 0.5), and most of the confidences had values ~0.5 which is strange for people with clear dimensions and shapes.
Here is the configuration file used in DeepStream for the people detection engine file:
config_infer_primary_peoplenet_detection.txt (1.3 KB)
What is the reason for this difference? Do TLT and DeepStream have different understandings and processing of confidence thresholds? If so, what is their relationship? Is there other parameters that could be affecting this behavior?