Models trained using TLT perform considerably worse when deployed on DeepStream

  • Hardware Platform (Jetson / GPU) T4 GPU
  • DeepStream Version 5.0
  • TensorRT Version 7.0
  • NVIDIA GPU Driver Version (valid for GPU only) 10.2
  • TLT V2.0: tlt-streamanalytics:v2.0_py3

I used TLT to train a people detection model (transfer learnt from PeopleNetV2.0 DetectNetV2 ResNet18).
After training, pruning and retraining with QAT, the model achieved satisfactory performance within the TLT environment, and I then proceeded to export the model and generate a TRT engine file. I performed some inferences using this engine file, and detections were great.

Next, I took this engine file and deployed it in DeepStream, and surprisingly enough, the performance of the model was considerably lower than what I have experienced in TLT! In the attached image, you can see a comparative example, where in the top part I ran inference on a video on DeepStream using the model with a pre-cluster-threshold of 0.5 and 0.4 (a relatively very low threshold to have), and in the bottom part I extracted the video frame and ran inference on this image on TLT using the model with a confidence threshold of 0.9 (a much more logical threshold to have). We can see that with a threshold of 0.9, the model ran in TLT (using tlt-infer) detected almost all people visible in the frame. However, with a pre-cluster-threshold of 0.5, the model ran in Deepstream didn’t detect any person object! And lowering this pre-cluster-threshold to 0.4, the model detects 3 people.
I also printed the obj_meta->confidence of the people (with pre-cluster-threshold set to 0.5), and most of the confidences had values ~0.5 which is strange for people with clear dimensions and shapes.

Here is the configuration file used in DeepStream for the people detection engine file:
config_infer_primary_peoplenet_detection.txt (1.3 KB)

What is the reason for this difference? Do TLT and DeepStream have different understandings and processing of confidence thresholds? If so, what is their relationship? Is there other parameters that could be affecting this behavior?

They are different. In TLT, see DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation

confidence _model : Algorithm to compute the final confidence of the clustered bboxes. In the aggregate_cov mode, the final confidence of a detection is the sum of the confidences of all the candidate bboxes in a cluster. In mean_cov mode, the final confidence is the mean confidence of all the bboxes in the cluster.
confidence _threshold : The threshold applied to the final aggregate confidence values to render the bboxes. In aggregate_cov: Maybe tuned to any float value > 0.0 In mean_cov: 0.0 - 1.0

In Deepstream, see Gst-nvinfer — DeepStream 5.1 Release documentation

pre-cluster-threshold: Detection threshold to be applied prior to clustering operation

Suggest you to set a lower pre-cluster-threshold in deepstream.

Thank you for the elaborate reply!

There’s two unclear things though:

  1. What confidence_model does DeepStream use? I.e., how will the confidence of a cluster be computed?
  2. What pre-cluster-threshold does TLT use? E.g., does it consider all predictions? (I.e., implicitly setting pre-cluster-threshold = 0?)

What I’m trying to understand, is how to exactly reproduce results between TLT and DeepStream.

Can you share your tlt-infer spec file?
You mentioned that you were using TLT 2.0, so please refer to Creating an Experiment Spec File — Transfer Learning Toolkit 2.0 documentation , there is another threshold in tlt-infer spec.
Can you modify the coverage_threshold and run tlt-infer?
More, please set to mean_cov mode instead of aggregate_cov mode for confidence _model. In mean_cov, the confidence _threshold will be 0.0 - 1.0, but in aggregate_cov mode, the confidence _threshold may be larger than 1.

Thanks again for the help and clarifications!
Here is the tlt-infer spec file that I’m currently using:

inferencer_config{
  target_classes: "person"
  # Inference dimensions.
  image_width: 852
  image_height: 480
  image_channels: 3
  batch_size: 64
  gpu_index: 0
  # model handler config
  tensorrt_config{
    trt_engine: "output_model/resnet18_detector_qat.trt.int8"
  }
}
bbox_handler_config{
  kitti_dump: true
  disable_overlay: false
  overlay_linewidth: 2
  classwise_bbox_handler_config{
    key:"person"
    value: {
      confidence_model: "aggregate_cov"
      output_map: "person"
      confidence_threshold: 0.9
      bbox_color{
        R: 0
        G: 255
        B: 0
      }
      clustering_config{
        coverage_threshold: 0.00
        dbscan_eps: 0.3
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 4
      }
    }
  }
  classwise_bbox_handler_config{
    key:"default"
    value: {
      confidence_model: "aggregate_cov"
      confidence_threshold: 0.9
      bbox_color{
        R: 255
        G: 0
        B: 0
      }
      clustering_config{
        coverage_threshold: 0.00
        dbscan_eps: 0.3
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 4
      }
    }
  }
}

So the coverage_threshold would refer to the pre-cluster-threshold used in DeepStream?
And the question concerning the type of confidence model that DeepStream uses was not directly answered. As confidences in DeepStream are between 0.0 - 1.0, does it mean that the mean_cov is used in DeepStream?

In TLT, The post processor module generates renderable bounding boxes from the raw detection output. The process includes:
Filtering out valid detections by thresholding objects using the confidence value in the coverage tensor
Clustering the raw filtered predictions using DBSCAN to produce the final rendered bounding boxes
Filtering out weaker clusters based on the final confidence threshold derived from the candidate boxes that get grouped into a cluster.

Firstly, please set mean_cov in tlt-infer spec to check if you can get similar result as deepstream.
Then try to set different coverage_threshold to check if you can get similar result as deepstream.

Deepstream has not confidence_model. See Gst-nvinfer — DeepStream 5.1 Release documentation