Phantom detections in cluster modes != 0

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
Jetson Nano
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)


• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I’m noticing phantom detections in all cluster-modes != 0 at high pre-cluster-threshold of 0.5.

I took a 50 seconds snip off of this video Morning Meeting and Warm up - Sysco Eastern WI - Capstone Logistics - YouTube and did run inference on it using the sample

Here is the 50 s snip off for download

Leave unchanged but apply these changes to dstest3_pgie_config.txt. IMHO all “normal” and legit changes:

diff --git a/apps/deepstream-test3/dstest3_pgie_config.txt b/apps/deepstream-test3/dstest3_pgie_config.txt
index a6a797f..16e3e65 100644
--- a/apps/deepstream-test3/dstest3_pgie_config.txt
+++ b/apps/deepstream-test3/dstest3_pgie_config.txt
@@ -59,23 +59,24 @@

With this setup you will notice a single shot person detection at frame 785 after about 10 seconds of no detection. With the same pre-cluster-threshold but cluster-mode=0 there is no such a phantom.

You might say, who gives a f… on such a single event. I would like to point out, that my use case is to detect possible collisions beforehand. As you can see, there is a long drive along an aisle in a warehouse. The algorithm shoots out of the sudden with a detection rectangle, which renders to an approximate distance of 1 m to the “person” ahead (aka ghost in this case). This would definitely have to result in a full-brake event. For nothing.

I need the “confidence” value, otherwise I would be OK to go with cluster-mode=0

Hey, If you set cluster-mode=3 that means DBSCAN algorithm, but eps and group-threshold are for cluster-mode=0, could you check Gst-nvinfer — DeepStream DeepStream Version: 5.0 documentation

I said I have this fake detection phenomenon in ALL cluster modes != 0. The eps and group-threshold values are just something taken from one of your templates…

The real problem is, that I have no clue, what exactly all these cluster modes mean. And it is not documented.

Other than that I would expect, that entries, irrelevant for a cluster mode, are gracefully ignored. BTW: At leasts eps seems to be relevant for 3 too.

Did you try my setup?

Yes, it’s documented, could you check Gst-nvinfer — DeepStream 5.1 Release documentation for the corresponding config items for all the cluster-mode.

Please share the detailed repro steps if the issue still not be resolved, I will debug locally.

Well, yes and no. The doc provides some buzz words w/o any context. So what exactly is behind this:

Integer 0: OpenCV groupRectangles() 1: DBSCAN 2: Non Maximum Suppression 3: DBSCAN + NMS Hybrid 4: No clustering

What is DBSCAN, NMS, Hybrid?!

Please share the detailed repro steps if the issue still not be resolved, I will debug locally.

All what you need is in my initial post. What else do you need?

DBSCAN(density-based spatial clustering of applications with noise ) , NMS(non max suppression) and hybrid are different types of clustering algorithms. The source code is available in /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp with comments. Hybrid option makes use of both dbscan and nms algorithms in a two step approach.

We will check internally to see if we can provide more info in the doc.

Ok, I see.

Thanks for the extra explanation. I hope you will be able to reproduce these “phantom detections”

I can repro your issue, actully when set cluster-mode=0, there are still some false detection.
For your input video, I think you can try with our peoplenet, you only need to detect the people, right?

If yes, please try our people net, I tried it locally and it can work well. THe config file is under /opt/nvidia/deepstream/deepstream-5.1/samples/configs/tlt_pretrained_models/
labels_peoplenet.txt and you can get the model via wget -O resnet34_peoplenet_pruned.etlt

Cool. Will give it a try. Thanks. I suppose you currently don’t have an explanation for the glitches?

I think it should be caused by the model itself, the resnet10.caffemodel is used to detect Car, Bicycle, Person, Roadsign, it’s more suitable for a traffic use case, I’m not sure if the false detection bbox is from a wrong person or maybe just a wrong Roadsign, you can print the classid to confirm it.
But in my opinion, it’s more suitable to use people net which can detect people, face and bag if you only want to detect people, you can filter out other class(bag, face) via install a probe in the nvinfer downstream plugin.

I’m not sure if the false detection bbox is from a wrong person or maybe just a wrong Roadsign, you can print the classid to confirm it.

It was a person. I double checked that.

I checked your suggestion. Does not work for me.

  1. I downloaded the model from the location you provided

  2. I copied the labels.txt and the config and stitched together this configuration:


It generally works, also with three cams, but

a) I need to set workspace-size=1000 because otherwise I’m catching INFO: [TRT]: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.

b) The initial creation of the model file takes about a minute. Correct?

c) With the given input_dims config (which is deprecated, infer_dims is suggested, but this seems to follow another format pattern) I can only achieve an inference rate of about 2 fps, even if I’m using one camera only. I’m usually using 3 USB cams and can achieve 30 fps inference rate per camera with the resnet10 model.

d) With input_dims like input-dims=3;244;244;0 I’m achieving about 10 fps. Not enough for me.

e) I finally thought to have found out, what infer_dims is, which I set to 3;640;480, because my input is 640x480. But the inference rate is still very, very low and the latency is exorbitant high compared to resnet10.

f) Most of the items around me are detected as “bag”, even part of my clothes. That doesn’t very much improve my situation.

=> Not that good

TO BE ADDED: Results are pretty good (not superior) with


Any explanations for that?

I’m now at 24 fps for all three cams with the 244,244 setting above, but I’m not sure what this setting means.

The algorithm also has a lot of phantom detections, mostly bags. :/