Very small Bounding Boxes with custom sgie model

monita.ramirezb · August 26, 2021, 9:57pm

Please provide complete information as applicable to your setup.
• Hardware Platform: Jetson Xavier NX
• DeepStream Version: 5.1
• Language: Python

Hi,
I’ve trained a model using TAO (former TLT) to detect if people are wearing helmet, security glasses, face mask and security boots correclty.
In my TAO tests the model behaved correctly, made prediction and corresponding labels just fine.
Now, I’m trying to add it to my Deepstream project, to work as sgie, using “Person” class as input from pgie (TrafficCamNet).
However, the model is not predicting correctly. I based my project on python_apps/test_1 (to add tracker and sgie), test_3 (to read from file or rstp) and multistream.py (to save img to file), so my final pipeline is:

streamux → pgie (People) → tracker → sgie (Custom TAO model) → nvidconv and filter (to save img to file) → tiler → nvosd → transform → sink

The custom model was trained on 2.200 images from 4 different classes (missing_helmet, missing_glasses, missing_mask, missing_boots) all of size 272x480.

I’ve tried running my app with MUXER_OUTPUT_WIDTH/HEIGHT and TILED_OUTPUT_WIDTH/HEIGHT varying from 420x272 (same as training size) to 649x480 without success.
These are examples of images saved by my app. I’m labeling people in blue BB and the sgie detection in red BB but, as you can see in many cases the red BB is very tiny and doesn’t match with the detection:
frame_421 frame_430 frame_509 frame_592 frame_1067

I think I’m doing something wrong with the size of the input video because, like I said, the tests using tlt-evaluate and images taken from the same video show good results.

Can anyone help me or guide me on the correct configuration of my model.
FYI, this is mi sgie config file:sgie_config_epp.txt (3.6 KB)

mchi · August 28, 2021, 7:32am

Hi,
How did you draw the BB of sgie?

where do you set the resolution?

could you share the pgie config as well?

And, I think you can refer to deepstream-test2, but the difference is your sgie is detection.

monita.ramirezb · August 29, 2021, 1:11pm

Sure,

To draw BB I did a small change to the draw_bounding_boxes function of imagedata_multistream.py:

def draw_bounding_boxes(image, obj_meta) :
      rect_params=obj_meta.rect_params
      top=int(rect_params.top)
      left=int(rect_params.left)
      width=int(rect_params.width)
      height=int(rect_params.height)
      obj_name = 'Persona '+str(obj_meta.object_id)
      color = (255,0,0,0)
      
      if obj_meta.parent is not None : # if sgie, Draw bb form parent (Person)
          image = draw_bounding_boxes(image, [obj_meta.parent])
          obj_name=sgie_classes_str[obj_meta.class_id]
          color = (0,0,255,0)
      
      image=cv2.rectangle(image,(left,top),(left+width,top+height),color,1)
      image=cv2.putText(image,obj_name,(left-10,top-10),cv2.FONT_HERSHEY_SIMPLEX,0.5,color,2)
    return image

I set these variables in my code:

MUXER_OUTPUT_WIDTH=420#Original:1280
MUXER_OUTPUT_HEIGHT=272#Original:720
TILED_OUTPUT_WIDTH=420#Original:1280
TILED_OUTPUT_HEIGHT=272#Original:720

Is there another place where this should be configured’

This is my pgie config (dstest3_pgie_config.txt (3.5 KB)
). As you can see in the images above it’s working fine by its own

mchi · August 30, 2021, 6:26am

So, the tiny BB is drew by this call, right? did you print the top/left/width/height to check if its size is really small ?

monita.ramirezb · August 31, 2021, 3:48am

Yes, as a matter of fact, adding this line to my code:
print('{}: left:{}, top:{}, width:{}, height:{}'.format(obj_name, left, top, width, height))
Will produce these lines in the output:

Persona 3: left:780, top:143, width:99, height:295
ojos_sin_proteccion: left:870, top:222, width:4, height:7

So for the person detection (pgie) the blue bounding box is accurate (295x99). But for sgie (“ojos_sin_proteccion”) detection is wrong and size is extremely small (7x4 pixels).

Although detections made by tlt-evaluate using frames taken from the same video, were accurate (in size and position). This is why I think the error is either in the trainig ize or the input resolution, but I don’t know the appropiate valuer or where to configure each.

mchi · September 1, 2021, 10:56am

Hi @monita.ramirezb ,
sgie receives the pgie object metadata from pgie and crops the object image based on the BBOX info in pgie object metadata, and then scale to the resolution of the input of sgie network. You can’t configure the input resolution for sgie.

And, I think your case is similar as sample - GitHub - NVIDIA-AI-IOT/deepstream_lpr_app: Sample app code for LPR deployment on DeepStream , it does card detetion (Car detection model), and then do car licence plate detection (LPD model), and then do car license plate recognization (LPR model). In your case, you don’t have the LPR model.

You can refer to deepstream_lpr_app/deepstream_lpr_app.c at master · NVIDIA-AI-IOT/deepstream_lpr_app · GitHub to add a probe on the sink pad of the OSD plugin, and leave OSD to draw the BBOX.

monita.ramirezb · September 6, 2021, 12:23am

A couple of concerns:

I don’t understand which atribute from the pgie config-file I should modify. According to the documentation and the LPR example, the only bbox attributes are: input-object-min-width/height and input-object-max-width/height on sgie config-file. I’ve played a little with them with no change in the detections (tiny bb) or no detections at all! (even when a person is clearly in the frame): Pgie is detecting people just fine, passing them to sgie, but its the sgie that’s detecting incorrectly.

So, if it’s not the resolution of the input image, what can it be? Why is the detection of the mode working fine on images, but failing so much on stream video?
Thanks in advance!

mchi · September 6, 2021, 2:59pm

How did you get this conclusion?

monita.ramirezb · September 6, 2021, 3:32pm

It’s my guess… because when I trained the model using TLT and ran tlt-infer on my test images (that were, actually, still frames from the test video) the labels and labeled images had almost perfect detections!
But when I deploy it to deepstream, and ran the full sample video, the tiny bounding boxes come up.
What else could be going wrong?

mchi · September 7, 2021, 2:02am

from the piece of code, it’s hard to say what the root cause is.
Could you apply below change to dump the raw output of the sgie and parse the output offline to check its BBOX side.

diff --git a/libs/nvdsinfer/nvdsinfer_context_impl.cpp b/libs/nvdsinfer/nvdsinfer_context_impl.cpp
index 73485e0..827001c 100644
--- a/libs/nvdsinfer/nvdsinfer_context_impl.cpp
+++ b/libs/nvdsinfer/nvdsinfer_context_impl.cpp
@@ -23,6 +23,7 @@
 #include <NvInferPlugin.h>
 #include <NvUffParser.h>
 #include <NvOnnxParser.h>
+#include <opencv2/imgcodecs.hpp>

 #include "nvdsinfer_context_impl.h"
 #include "nvdsinfer_conversion.h"
@@ -528,6 +529,17 @@ InferPostprocessor::copyBuffersToHostMemory(NvDsInferBatch& batch, CudaStream& m
                         batch.m_BatchSize,
                     cudaMemcpyDeviceToHost, mainStream),
                 "postprocessing cudaMemcpyAsync for output buffers failed");
+           {
+               cudaStreamSynchronize(mainStream);
+
+               std::string filename =
+                       "gie-" + std::to_string(m_UniqueID) +
+                       "output-layer-index-" + std::to_string(i);
+               std::ofstream dump_file(filename, std::ios::binary);
+               dump_file.write((char *)batch.m_HostBuffers[info.bindingIndex]->ptr(),
+                       getElementSize(info.dataType) * info.inferDims.numElements *
+                       batch.m_BatchSize);
+           }
         }
         else if (needInputCopy())
         {

monita.ramirezb · September 7, 2021, 3:11am

Sorry, not a good C++ user. My code is in Python.
But, maybe you can describe what the code does to see if I can replicate it on Python?
Thanks in advance!

mchi · September 7, 2021, 3:21am

This code is to be added in nvinfer libs, nvinfer only supports C++, so no need to change to python.

HOW to apply the code:

# cd /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/
==> apply above change to nvdsinfer_context_impl.cpp 
# export CUDA_VER=10.2
# make
# cp /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_infer.so   ~/libnvds_infer.so.bak
# cp libnvds_infer.so /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_infer.so

then run your app, it will dump the raw output of the TRT infer for your mdoels into a binary file, then you can parse the file offline to check if the BBOX is really tiny.

monita.ramirezb · September 8, 2021, 4:02am

Thanks for the step by step explanation! :)
I did the changes you mention on nvdsinfer_context_impl.cpp and my app ran without errors. But I can’t seem to find where the binary file you mention is located.

It’s not in the same folder as my app, nor in the nvdsinfer folder. (Even so, find -name gie-* returns no results)
Am I understanding wrong?

mchi · September 9, 2021, 2:15pm

Not sure what the problem is …

I verified the steps again, it can generate gie-1output-layer-index-* in the folder you run the command.

monita.ramirezb · September 16, 2021, 4:46am

Ok, so getting the image this way is not working for me, but I this this other test:
I took deepstream_test_2.py and only changed the first sgie config file for mine. I tested it with a sample video .h64 and got similar results (This image is directly from the playback video, because in this test, I’m not even saving the image or generating the BB)

Even so, I added a little bit of code:

if obj_meta.unique_component_id == 2 : # sgie
                obj_meta.rect_params.border_color.set(0.0, 0.0, 255.0, 1.0) #R,G,B,alpha
                print('Detecto:', obj_meta.class_id, '(', obj_meta.rect_params.width, obj_meta.rect_params.height, ')')

And got results like:

Detecto: 3 ( 5.684528827667236 26.32435417175293 )
Detecto: 0 ( 1.1010053157806396 6.656162738800049 )
Detecto: 1 ( 3.4143595695495605 6.315043926239014 )

So the BB are being returned really tiny from the model. It’s not a matter of the process of saving the file because I’m not doing that in this case, any other idea what could be?

monita.ramirezb · September 24, 2021, 5:47am

Hi,
Any udpate on this subject?

mchi · September 26, 2021, 2:52pm

Sorry for late response! I didn’t see tiny BBOX on your screenshot, could you point out the tiny bbox?

Thanks!

monita.ramirezb · September 26, 2021, 6:05pm

You’re right, they’re so small you can barely see them!
Here I made them white, thick and removed the labels. You can see them in the left border of the bounding box most of the times:

mchi · September 27, 2021, 12:48am

Sorry! Still don’t see them. Could you draw circles on the tiny BBOXs to highlight them?

monita.ramirezb · September 27, 2021, 1:39am

Sure, here you can see they are really tiny WHITE BB:

Topic		Replies	Views
Sgie inference does not work on all detected objects DeepStream SDK cuda , ubuntu , gstreamer , python	12	1874	November 9, 2021
Issues with Face Recognition DeepStream SDK deepstream	19	120	April 29, 2025
Converting Custom RetinaNet model to TensorRT in DeepStream DeepStream SDK tensorrt , neural-network-framework , jetson , deepstream , net	29	107	January 21, 2025
No sgie metadata for some pgie detections using pyds DeepStream SDK gstreamer , python	17	1463	October 12, 2021
LPD training model not OK DeepStream SDK	17	390	July 18, 2022
Difference between predictions of exported TensorRT engine and PyTorch pth models DeepStream SDK	22	2343	March 14, 2023
Use YOLO Keypoints for Secondary GIE (LSTM Classifier) DeepStream SDK	26	801	July 30, 2024
Deepstream YOLOV8 parser DeepStream SDK jetson , deepstream	17	98	April 10, 2025
Can only see SGIE inferences inside the PGIE inferences. Not sure if SGIE is acting on full frame DeepStream SDK	12	518	September 26, 2023
Deepstream Analytics Removefilter Isses DeepStream SDK jetson-inference , python	8	804	October 12, 2021

Very small Bounding Boxes with custom sgie model

Related topics