Excessive Detections (False Positives) with TAO Model (BatchedNMS) after DeepStream 6.4 to 7.0 Migration

Please provide complete information as applicable to your setup.

**• Hardware Platform (Jetson Orin NX **
• DeepStream Version 7.0
**• JetPack Version 6.0 **

• Issue Type( bug)

• using an yolo.onnx model that was functioning normally on 6.4 and creating an engine out of it
• Requirement details : yolov4 tiny custom model

I’m encountering an issue with excessive false-positive detections after migrating my object detection pipeline from DeepStream 6.4 to DeepStream 7.0.

Background:

  • I have an object detection model yolov4 tiny trained and exported using the NVIDIA TAO Toolkit.
  • The exported ONNX model includes the BatchedNMS node, meaning Non-Maximum Suppression is performed within the model graph itself.
  • This setup worked well on DeepStream 6.4, producing an expected number of detections.

Migration and Problem:

  • I recently migrated my system to DeepStream 7.0 (and its corresponding TensorRT, CUDA versions).
  • I am using the same .onnx model file as before.
  • Since migrating to DS 7.0, the pipeline produces significantly more detections, including many clear false positives, even though the input video/images are the same.

Config file:
[property]
gpu-id=0
net-scale-factor=0.0039215686 #1 does not change anythig

model-color-format=0 #0 RGB 1 BGR
labelfile-path=filter_classes.txt
model-engine-file=filter_recognition_epoch_80.onnx_b1_gpu0_fp32.engine
onnx-file=filter_recognition_epoch_80.onnx

infer-dims=3;416;416
maintain-aspect-ratio=1
uff-input-order=0
uff-input-blob-name=Input
batch-size=1
network-mode=0
num-detected-classes=2
interval=0
gie-unique-id=1
is-classifier=0
cluster-mode=3

output-blob-names=BatchedNMS
parse-bbox-func-name=NvDsInferParseCustomBatchedNMSTLT

#custom-lib-path=libnvds_infercustomparser_tao.so
custom-lib-path=libnvds_infercustomparser.so

[class-attrs-all]

#pre-cluster-threshold=0.5 #(it is already in the onnx? does not change nothing when I comment)
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

Troubleshooting Steps Taken:

  1. Custom Parser: I have successfully rebuilt the TAO custom parser library (libnvds_infercustomparser_tao.so or equivalent) using the DeepStream 7.0 SDK. I verified using nm -D that the required NvDsInferParseCustomBatchedNMSTLT function exists in the compiled library being used.
  2. pre-cluster-threshold: I tested setting the pre-cluster-threshold value in the [class-attrs-all] section of the config file (e.g., to 0.5). Commenting out or changing this value has no apparent effect on the number of output detections. This strongly suggests the effective confidence threshold is determined internally by the BatchedNMS node in the model.
  3. net-scale-factor: I tested both net-scale-factor=1 and net-scale-factor=0.0039215686 (after verifying my model’s expected input). While ensuring the correct value is important, changing this did not resolve the excessive detection issue. (I also deleted the .engine file after changing this).
  4. Engine Regeneration: I have deleted the TensorRT .engine file multiple times to ensure it is regenerated cleanly by DeepStream 7.0’s version of TensorRT from the original .onnx file. This did not change the outcome.
  5. libnvds_infercustomparser.so I rebuild the libnvds_infercustomparser.so or also libnvds_infercustomparser_tao.so without success.

Hypothesis & Questions:

I think thatdifferences or behavior changes in the BatchedNMS implementation within the newer TensorRT version (used by DS 7.0) compared to the older version (used by DS 6.4) are causing more detections to pass the fixed confidence threshold embedded within the ONNX model’s BatchedNMS node. Could that be right??

  1. Controlling Internal Threshold: How can I effectively control or increase the confidence threshold that is applied internally by the BatchedNMS node when exporting the model from TAO Toolkit? What specific parameters should I look for in the tao export (or possibly tao prune / tao deploy) command or spec file?
  2. TensorRT Behavior Changes: Are there any known changes in TensorRT versions (relevant to DS 6.4 vs. DS 7.0) regarding the BatchedNMS operation (or general layer precision/numerics) that might explain why a previously embedded threshold now results in more detections passing?
  3. Other DS 7.0 Factors: Are there any other DeepStream 7.0 nvinfer configuration parameters or parser behaviors I might be overlooking that could influence the final detections when using a model with internal NMS?

Is there a need to retrain onnx for Deepstream 7.0 in yolov4?:

Any guidance on how to adjust the internal NMS threshold during TAO export or other potential solutions would be greatly appreciated!

Thanks!

noticing cluster-mode is set to 3, but there is no any parameters for cluster-mode=3. please refer to /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.yml and find the explanations of cluster-mode in the doc.

cluster-mode = 0 | GroupRectangles
cluster-mode = 1 | DBSCAN
cluster-mode = 2 | NMS
cluster-mode = 3 | Hybrid
cluster-mode=4 | No clustering

So the cluster-mode 3 is Hybrid. I tried different cluster modes without sucess.
The engine model created in DS7.0 finds 4-5 objects in the frame with confidence=1, so any clustering seems useless. Unless pre-cluster-threshold is >1 it seems to detect objects.

Again; the same ONNX model works perfectly in DS6.4 when converted to engine.

So the question remains: Do I have to train onnx again, or is it a postprocessing problem? If postprocessing, what could cause that problem?

duplicated with Unexpected Detection Behavior After Migrating YOLOv4-Tiny Model from DeepStream 6.4 to 7.0

from the compatibility table, DS6.4 and DS7.0 uses the same TRT engine version. nvinfer supports dumping input and output tensors by adding the following configurations in nvinfer cfg.

  dump-input-tensor: 1
  dump-output-tensor: 1

if the input tensors are the same, the issue should be related to inference and postprocessing. if the output tensors are the same, the issue should be related to postprocessing.

I currently cannot compare the tensors of older engine. I had trained the onnx file on container. As it is also multiple times tested the onnx file should be intact. I had trained and evaluated it in tao container.

It was trained on this config with in tao container:

"
random_seed: 42

yolov4_config {
big_anchor_shape: “[(105, 100), (82, 82), (70, 70)]”
mid_anchor_shape: “[(104,99), (80, 80), (68, 66)]”
box_matching_iou: 0.25
matching_neutral_box_iou: 0.5
arch: “cspdarknet_tiny”
loss_loc_weight: 5.0
loss_neg_obj_weights: 0.5
loss_class_weights: 1.0
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.05
freeze_bn: false
force_relu: false
}

training_config {
batch_size_per_gpu: 8
num_epochs: 120
enable_qat: false
checkpoint_interval: 5
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-7
max_learning_rate: 1e-4
soft_start: 0.3
}
}
regularizer {
type: L1
weight: 3e-5
}
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
}

eval_config {
average_precision_mode: SAMPLE
batch_size: 1
matching_iou_threshold: 0.5
}

nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
force_on_cpu: true
top_k: 200
}

augmentation_config {
hue: 0.1
saturation: 1.5
exposure: 1.5
vertical_flip: 0
horizontal_flip: 0.5
jitter: 0.1
output_width: 416
output_height: 416
output_channel: 3
randomize_input_shape_period: 10
mosaic_prob: 0.0
mosaic_min_ratio: 0.2
}

dataset_config {
data_sources: {
label_directory_path: “/workspace/416x416_pictures/train/label” # Path to your annotation files
image_directory_path: “/workspace/416x416_pictures/train/image” # Path to your image files
}

include_difficult_in_training: true

target_class_mapping {
key: “lim”
value: “lim”
}
target_class_mapping {
key: “rfx”
value: “rfx”
}

validation_data_sources: {
label_directory_path: “/workspace/416x416_pictures/validate/label” # Path to your validation annotations
image_directory_path: “/workspace/416x416_pictures/validate/image” # Path to your validation images
}
}
"

yes, “DS6.4 and DS7.0 uses the same TRT engine version” but when I try to use the engine that was created on DS6.4, the old engine fails to load and it creates a new one, and new one has bugs.

Could there be a bug in engine creation in DS7.0?:)

could you share the complete running log? wondering if there is any tip information. Thanks!

So, the interesting situation is this:

I train yolov4 and yolov4tiny on the same container (tao-toolkit:5.0.0-tf1.15.5)

I use the train config files suggested on the nvidia site. (It is referenced below)

The yolov4 works; yolov4tiny does not!! At first it had found too many objects, now it recognizes none.
it seems that DS7.0 is struggling with yolov4 tiny.

YOLOv4tiny Training config:

random_seed: 42
yolov4_config {
big_anchor_shape: “[(283.04, 188.81), (197.35, 125.71), (178.12, 113.24)]”
mid_anchor_shape: “[(158.12, 101.34), (140.54, 91.66), (114.74, 78.73)]”
box_matching_iou: 0.25
matching_neutral_box_iou: 0.5
arch: “cspdarknet_tiny”
loss_loc_weight: 5.0
loss_neg_obj_weights: 0.5
loss_class_weights: 1.0
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.05
freeze_bn: false
force_relu: false
}
training_config {
batch_size_per_gpu: 1
num_epochs: 120
enable_qat: false
checkpoint_interval: 5
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-7
max_learning_rate: 1e-4
soft_start: 0.3
}
}
regularizer {
type: L1
weight: 3e-5
}
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 1
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
force_on_cpu: true
top_k: 200
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure: 1.5
vertical_flip: 0
horizontal_flip: 0.5
jitter: 0.1
output_width: 416
output_height: 416
output_channel: 3
randomize_input_shape_period: 10
mosaic_prob: 0.0
mosaic_min_ratio: 0.2
}
dataset_config {
data_sources: {
label_directory_path: “/workspace/416x416_pictures/train&validate folders for fil non fil/train/label” # Path to your annotation files
image_directory_path: “/workspace/416x416_pictures/train&validate folders for fil non fil/train/image” # Path to your image files
}
include_difficult_in_training: true
target_class_mapping {
key: “fil”
value: “fil”
} target_class_mapping {
key: “nonfil”
value: “nonfil”
}
validation_data_sources: {
label_directory_path: “/workspace/416x416_pictures/train&validate folders for fil non fil/validate/label” # Path to your validation annotations
image_directory_path: “/workspace/416x416_pictures/train&validate folders for fil non fil/validate/image” # Path to your validation images
}
}

YOLOv4 TRAINING (RESNET ARCHnot CSP tiny framwork TRAINING)

random_seed: 42
yolov4_config {
big_anchor_shape: “[(105, 100), (82, 82), (70, 70)]”
mid_anchor_shape: “[(104,99), (80, 80), (68, 66)]”
small_anchor_shape: “[(101, 97), (78,77), (65, 65)]”
box_matching_iou: 0.25
matching_neutral_box_iou: 0.5
arch: “resnet”
nlayers: 18
loss_loc_weight: 1.0
loss_neg_obj_weights: 1.0
loss_class_weights: 1.0
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.1
small_grid_xy_extend: 0.2
freeze_bn: false
freeze_blocks: 0
force_relu: false
}
training_config {
batch_size_per_gpu: 1
num_epochs: 120
enable_qat: false
checkpoint_interval: 5
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-7
max_learning_rate: 1e-4
soft_start: 0.3
}
}
regularizer {
type: L1
weight: 3e-5
}
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 1
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
top_k: 200
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure:1.5
vertical_flip:0
horizontal_flip: 0.5
jitter: 0.3
output_width: 1248
output_height: 384
output_channel: 3
randomize_input_shape_period: 100
mosaic_prob: 0.5
mosaic_min_ratio:0.2
image_mean {
key: ‘b’
value: 103.9
}
image_mean {
key: ‘g’
value: 116.8
}
image_mean {
key: ‘r’
value: 123.7
}
}
dataset_config {
data_sources: {
label_directory_path: “/workspace/416x416_pictures/train&validate folders for fil non fil/train/label” # Path to your annotation files
image_directory_path: “/workspace/416x416_pictures/train&validate folders for fil non fil/train/image” # Path to your image files
}
include_difficult_in_training: true
target_class_mapping {
key: “fil”
value: “fil”
}
target_class_mapping {
key: “nonfil”
value: “nonfil”
}
validation_data_sources: {
label_directory_path: “/workspace/416x416_pictures/train&validate folders for fil non fil/validate/label” # Path to your validation annotations
image_directory_path: “/workspace/416x416_pictures/train&validate folders for fil non fil/validate/image” # Path to your validation images
}
}

I just see that the “output_width: 1248
output_height: 384” should be 416x416 in the augmentation, but I suppose output size is not the issue as it is still the one that works:)

Above I attached the two training config files. I do not attach log, as log shows no errors; it converts onyx to engine seemingly fine and everything in log seems perfect.

Do you see any other errors? Why is the tiny not working? any suggestions?

Ps: I run them both with the same config on DS 7.0

  1. do you mean on Ds6,4, testing yolov4tiny the app can detect many correct objects? on Ds6,4, testing the same model, the app can’t detect any correct objects?
  2. when testing yolov4tiny on DS7.0, is the engine regenerated or are you using the engine created on DS6.4?
  3. could you try this NV ready-made yolov4 tiny sample for tao5.3_ds7.0ga? Noticing the configurations are almost the same, if this sample works, the issue should be related to the model.

Is this still an DeepStream issue to support? AYK, DeepStream leverages TensorRT to do inference. To rull out DeepTeam, could you test TensorRT inference separately by referring to this yolo inference sample? Thanks!

The yolov4 TINY model on DS6.3 works well

The yolov4 tiny model on DS7.0 does NOT work.
- it produces the engine but it does recognizes too many objects

The yolov4 model (not tiny) builds engine and works on DS7.0 well.

I have trained the Yolov4 and Yolov4tiny models on the same docker.

Yolov4 training config

random_seed: 42

yolov4_config {
  big_anchor_shape: "[(105, 100), (82, 82), (70, 70)]"
  mid_anchor_shape: "[(104,99), (80, 80), (68, 66)]"
  small_anchor_shape: "[(101, 97), (78,77), (65, 65)]"
  box_matching_iou: 0.25
  matching_neutral_box_iou: 0.5
  arch: "resnet"
  nlayers: 18
  loss_loc_weight: 1.0
  loss_neg_obj_weights: 1.0
  loss_class_weights: 1.0
  label_smoothing: 0.0
  big_grid_xy_extend: 0.05
  mid_grid_xy_extend: 0.1
  small_grid_xy_extend: 0.2
  freeze_bn: false
  freeze_blocks: 0
  force_relu: false
}

training_config {
  batch_size_per_gpu: 1
  num_epochs: 120
  enable_qat: false
  checkpoint_interval: 3
  learning_rate {
    soft_start_cosine_annealing_schedule {
      min_learning_rate: 1e-7
      max_learning_rate: 1e-4
      soft_start: 0.3
    }
  }
  regularizer {
    type: L1
    weight: 3e-5
  }
  optimizer {
    adam {
      epsilon: 1e-7
      beta1: 0.9
      beta2: 0.999
      amsgrad: false
    }
  }
}

eval_config {
  average_precision_mode: SAMPLE
  batch_size: 1
  matching_iou_threshold: 0.5
}

nms_config {
  confidence_threshold: 0.001
  clustering_iou_threshold: 0.5
  top_k: 200
}

augmentation_config {
  hue: 0.1
  saturation: 1.5
  exposure:1.5
  vertical_flip:0
  horizontal_flip: 0.5
  jitter: 0.3
  output_width: 416
  output_height: 416
  output_channel: 3
  randomize_input_shape_period: 100
  mosaic_prob: 0.5
  mosaic_min_ratio:0.2
  image_mean {
    key: 'b'
    value: 103.9
  }
  image_mean {
    key: 'g'
    value: 116.8
  }
  image_mean {
    key: 'r'
    value: 123.7
  }
}

dataset_config {
  data_sources: {
    label_directory_path: "/workspace/416x416_pictures/train&validate folders for fil non fil/train/label"  # Path to your annotation files
    image_directory_path: "/workspace/416x416_pictures/train&validate folders for fil non fil/train/image"  # Path to your image files
  }

  include_difficult_in_training: true

  target_class_mapping {
    key: "fil"
    value: "fil"
  }
  target_class_mapping {
    key: "nonfil"
    value: "nonfil"
  }


  validation_data_sources: {
    label_directory_path: "/workspace/416x416_pictures/train&validate folders for fil non fil/validate/label"  # Path to your validation annotations
    image_directory_path: "/workspace/416x416_pictures/train&validate folders for fil non fil/validate/image"  # Path to your validation images
  }
}

YoloV4Tiny training config (works on DS6 not DS7)

random_seed: 42

yolov4_config {
  big_anchor_shape: "[(283.04, 188.81), (197.35, 125.71), (178.12, 113.24)]"
  mid_anchor_shape: "[(158.12, 101.34), (140.54, 91.66), (114.74, 78.73)]"
  box_matching_iou: 0.25
  matching_neutral_box_iou: 0.5
  arch: "cspdarknet_tiny"
  loss_loc_weight: 5.0
  loss_neg_obj_weights: 0.5
  loss_class_weights: 1.0
  label_smoothing: 0.0
  big_grid_xy_extend: 0.05
  mid_grid_xy_extend: 0.05
  freeze_bn: false
  # freeze_blocks: 0
  force_relu: false
}

training_config {
  batch_size_per_gpu: 1
  num_epochs: 120
  enable_qat: false
  checkpoint_interval: 5
  learning_rate {
    soft_start_cosine_annealing_schedule {
      min_learning_rate: 1e-7
      max_learning_rate: 1e-4
      soft_start: 0.3
    }
  }
  regularizer {
    type: L1
    weight: 3e-5
  }
  optimizer {
    adam {
      epsilon: 1e-7
      beta1: 0.9
      beta2: 0.999
      amsgrad: false
    }
  }
}

eval_config {
  average_precision_mode: SAMPLE
  batch_size: 1
  matching_iou_threshold: 0.5
}

nms_config {
  confidence_threshold: 0.001
  clustering_iou_threshold: 0.5
  force_on_cpu: true
  top_k: 200
}

augmentation_config {
  hue: 0.1
  saturation: 1.5
  exposure: 1.5
  vertical_flip: 0
  horizontal_flip: 0.5
  jitter: 0.1
  output_width: 416
  output_height: 416
  output_channel: 3
  randomize_input_shape_period: 10
  mosaic_prob: 0.0
  mosaic_min_ratio: 0.2
}

dataset_config {
  data_sources: {
    label_directory_path: "/workspace/416x416_pictures/train&validate folders for fil non fil/train/label"  # Path to your annotation files
    image_directory_path: "/workspace/416x416_pictures/train&validate folders for fil non fil/train/image"  # Path to your image files
  }

  include_difficult_in_training: true

  target_class_mapping {
    key: "fil"
    value: "fil"
  }
  target_class_mapping {
    key: "nonfil"
    value: "nonfil"
  }


  validation_data_sources: {
    label_directory_path: "/workspace/416x416_pictures/train&validate folders for fil non fil/validate/label"  # Path to your validation annotations
    image_directory_path: "/workspace/416x416_pictures/train&validate folders for fil non fil/validate/image"  # Path to your validation images
  }
}

config to run the code on DS 7 : (yolov4tiny or yolov4 ) yolov4 works yolov4tiny onnx builds but does NOT recognize (except I changethe onnx file path)

[property]
scaling-compute-hw = 0
gpu-id=0
net-scale-factor=1

model-color-format=0  #0 RGB 1 BGR
labelfile-path=filter_classes.txt
#model-engine-file=filter_recognition_epoch_80.onnx_b1_gpu0_fp32.engine
#onnx-file=filter_recognition_epoch_80.onnx


onnx-file=yolov4tiny_epoch45_12_04_25.onnx


output-tensor-meta=1

infer-dims=3;416;416
maintain-aspect-ratio=1
uff-input-order=0
uff-input-blob-name=Input
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
num-detected-classes=2
interval=0
gie-unique-id=1
is-classifier=0
#network-type=0
cluster-mode=0
#output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid
output-blob-names=BatchedNMS
parse-bbox-func-name=NvDsInferParseCustomBatchedNMSTLT

custom-lib-path=libnvds_infercustomparser_tao.so
#custom-lib-path=libnvds_infercustomparser.so

dump-input-tensor= 1
dump-output-tensor= 1
[class-attrs-all]

pre-cluster-threshold=0.5
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

Is this for yolov4 compatible

  1. do you mean testing yolov4-tiny on DS7.0, it can detect some objects, but the number is fewer than testing on yolov4-tiny on DS6.3? please make the nvinfer configuration file is the same on the two DS versions.
  2. from the compatibility table mention above, the TensorRT version is different on DS6.3 and DS7.0. To narrow down this issue, please test TensorRT inference separately by referring to yolo TensorRT inference sample mentioned above. it is for yolov3. maybe the model dimension and postprocessing are different.

No the number of recognized objects with confidence 1.0 is lime 30 in DS7.0.

I copied the nvinfer-configuration that worked on DS6.3 to DS7.0–>they are the same. The same infer config can also can run yolov4 model on DS7.0 well. So currently my app on DS7. runs yolov4 but cannot run yolov4tiny

Is yolov4tiny (cspdarknet arch) compatible with DS7.0?

Did you try other cluster-mode? could you share the yolov4-tiny model? You can use forum private email. please click forum avatar-> personal messages->new message. This NV ready-made yolov4 tiny sample work well on DS7.0. the model is public.

I ve tried all 4 cluster modes without success

Testing the model you shared on DS6.3 and DS7.0 with the native sample_720p.h264, both can’t detect any object. Here are the cfg and log. dstest1_pgie_config.txt (2.9 KB) 6.3.txt (3.8 KB) 7.0.txt (9.3 KB) yolov4_labels.txt (29 Bytes)what is the model used to detect? could you share the test video and filter_classes.txt?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

We tried all cluster mode without any help ,

Best Regards

Thanks for the sharing! please refer to my last comment. As you said, the model works well on DS6.3 and DS7.0. but your model can’t work on my DS6.3. could you check my configuration file and provde the test video? to help reprodcue this issue.
Or to narrow down this issue, you can use tensort to test your model first by referring yolov3_onnx sample above.