Very small Bounding Boxes with custom sgie model

Please provide complete information as applicable to your setup.
• Hardware Platform: Jetson Xavier NX
• DeepStream Version: 5.1
• Language: Python

Hi,
I’ve trained a model using TAO (former TLT) to detect if people are wearing helmet, security glasses, face mask and security boots correclty.
In my TAO tests the model behaved correctly, made prediction and corresponding labels just fine.
Now, I’m trying to add it to my Deepstream project, to work as sgie, using “Person” class as input from pgie (TrafficCamNet).
However, the model is not predicting correctly. I based my project on python_apps/test_1 (to add tracker and sgie), test_3 (to read from file or rstp) and multistream.py (to save img to file), so my final pipeline is:

streamux → pgie (People) → tracker → sgie (Custom TAO model) → nvidconv and filter (to save img to file) → tiler → nvosd → transform → sink

The custom model was trained on 2.200 images from 4 different classes (missing_helmet, missing_glasses, missing_mask, missing_boots) all of size 272x480.

I’ve tried running my app with MUXER_OUTPUT_WIDTH/HEIGHT and TILED_OUTPUT_WIDTH/HEIGHT varying from 420x272 (same as training size) to 649x480 without success.
These are examples of images saved by my app. I’m labeling people in blue BB and the sgie detection in red BB but, as you can see in many cases the red BB is very tiny and doesn’t match with the detection:
frame_421frame_430frame_509frame_592frame_1067

I think I’m doing something wrong with the size of the input video because, like I said, the tests using tlt-evaluate and images taken from the same video show good results.

Can anyone help me or guide me on the correct configuration of my model.
FYI, this is mi sgie config file:sgie_config_epp.txt (3.6 KB)

Hi,
How did you draw the BB of sgie?

where do you set the resolution?

could you share the pgie config as well?

And, I think you can refer to deepstream-test2, but the difference is your sgie is detection.

Sure,

  1. To draw BB I did a small change to the draw_bounding_boxes function of imagedata_multistream.py:
def draw_bounding_boxes(image, obj_meta) :
      rect_params=obj_meta.rect_params
      top=int(rect_params.top)
      left=int(rect_params.left)
      width=int(rect_params.width)
      height=int(rect_params.height)
      obj_name = 'Persona '+str(obj_meta.object_id)
      color = (255,0,0,0)
      
      if obj_meta.parent is not None : # if sgie, Draw bb form parent (Person)
          image = draw_bounding_boxes(image, [obj_meta.parent])
          obj_name=sgie_classes_str[obj_meta.class_id]
          color = (0,0,255,0)
      
      image=cv2.rectangle(image,(left,top),(left+width,top+height),color,1)
      image=cv2.putText(image,obj_name,(left-10,top-10),cv2.FONT_HERSHEY_SIMPLEX,0.5,color,2)
    return image
  1. I set these variables in my code:
MUXER_OUTPUT_WIDTH=420#Original:1280
MUXER_OUTPUT_HEIGHT=272#Original:720
TILED_OUTPUT_WIDTH=420#Original:1280
TILED_OUTPUT_HEIGHT=272#Original:720

Is there another place where this should be configured’

  1. This is my pgie config (dstest3_pgie_config.txt (3.5 KB)
    ). As you can see in the images above it’s working fine by its own

So, the tiny BB is drew by this call, right? did you print the top/left/width/height to check if its size is really small ?

Yes, as a matter of fact, adding this line to my code:
print('{}: left:{}, top:{}, width:{}, height:{}'.format(obj_name, left, top, width, height))
Will produce these lines in the output:

Persona 3: left:780, top:143, width:99, height:295
ojos_sin_proteccion: left:870, top:222, width:4, height:7

So for the person detection (pgie) the blue bounding box is accurate (295x99). But for sgie (“ojos_sin_proteccion”) detection is wrong and size is extremely small (7x4 pixels).

Although detections made by tlt-evaluate using frames taken from the same video, were accurate (in size and position). This is why I think the error is either in the trainig ize or the input resolution, but I don’t know the appropiate valuer or where to configure each.

Hi @monita.ramirezb ,
sgie receives the pgie object metadata from pgie and crops the object image based on the BBOX info in pgie object metadata, and then scale to the resolution of the input of sgie network. You can’t configure the input resolution for sgie.

And, I think your case is similar as sample - GitHub - NVIDIA-AI-IOT/deepstream_lpr_app: Sample app code for LPR deployment on DeepStream</title , it does card detetion (Car detection model), and then do car licence plate detection (LPD model), and then do car license plate recognization (LPR model). In your case, you don’t have the LPR model.

You can refer to deepstream_lpr_app/deepstream_lpr_app.c at master · NVIDIA-AI-IOT/deepstream_lpr_app · GitHub to add a probe on the sink pad of the OSD plugin, and leave OSD to draw the BBOX.

A couple of concerns:

I don’t understand which atribute from the pgie config-file I should modify. According to the documentation and the LPR example, the only bbox attributes are: input-object-min-width/height and input-object-max-width/height on sgie config-file. I’ve played a little with them with no change in the detections (tiny bb) or no detections at all! (even when a person is clearly in the frame): Pgie is detecting people just fine, passing them to sgie, but its the sgie that’s detecting incorrectly.

So, if it’s not the resolution of the input image, what can it be? Why is the detection of the mode working fine on images, but failing so much on stream video?
Thanks in advance!

How did you get this conclusion?

It’s my guess… because when I trained the model using TLT and ran tlt-infer on my test images (that were, actually, still frames from the test video) the labels and labeled images had almost perfect detections!
But when I deploy it to deepstream, and ran the full sample video, the tiny bounding boxes come up.
What else could be going wrong?

from the piece of code, it’s hard to say what the root cause is.
Could you apply below change to dump the raw output of the sgie and parse the output offline to check its BBOX side.

diff --git a/libs/nvdsinfer/nvdsinfer_context_impl.cpp b/libs/nvdsinfer/nvdsinfer_context_impl.cpp
index 73485e0..827001c 100644
--- a/libs/nvdsinfer/nvdsinfer_context_impl.cpp
+++ b/libs/nvdsinfer/nvdsinfer_context_impl.cpp
@@ -23,6 +23,7 @@
 #include <NvInferPlugin.h>
 #include <NvUffParser.h>
 #include <NvOnnxParser.h>
+#include <opencv2/imgcodecs.hpp>

 #include "nvdsinfer_context_impl.h"
 #include "nvdsinfer_conversion.h"
@@ -528,6 +529,17 @@ InferPostprocessor::copyBuffersToHostMemory(NvDsInferBatch& batch, CudaStream& m
                         batch.m_BatchSize,
                     cudaMemcpyDeviceToHost, mainStream),
                 "postprocessing cudaMemcpyAsync for output buffers failed");
+           {
+               cudaStreamSynchronize(mainStream);
+
+               std::string filename =
+                       "gie-" + std::to_string(m_UniqueID) +
+                       "output-layer-index-" + std::to_string(i);
+               std::ofstream dump_file(filename, std::ios::binary);
+               dump_file.write((char *)batch.m_HostBuffers[info.bindingIndex]->ptr(),
+                       getElementSize(info.dataType) * info.inferDims.numElements *
+                       batch.m_BatchSize);
+           }
         }
         else if (needInputCopy())
         {

Sorry, not a good C++ user. My code is in Python.
But, maybe you can describe what the code does to see if I can replicate it on Python?
Thanks in advance!

This code is to be added in nvinfer libs, nvinfer only supports C++, so no need to change to python.

HOW to apply the code:

# cd /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/
==> apply above change to nvdsinfer_context_impl.cpp 
# export CUDA_VER=10.2
# make
# cp /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_infer.so   ~/libnvds_infer.so.bak
# cp libnvds_infer.so /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_infer.so

then run your app, it will dump the raw output of the TRT infer for your mdoels into a binary file, then you can parse the file offline to check if the BBOX is really tiny.

Thanks for the step by step explanation! :)
I did the changes you mention on nvdsinfer_context_impl.cpp and my app ran without errors. But I can’t seem to find where the binary file you mention is located.

It’s not in the same folder as my app, nor in the nvdsinfer folder. (Even so, find -name gie-* returns no results)
Am I understanding wrong?

Not sure what the problem is …

I verified the steps again, it can generate gie-1output-layer-index-* in the folder you run the command.

Ok, so getting the image this way is not working for me, but I this this other test:
I took deepstream_test_2.py and only changed the first sgie config file for mine. I tested it with a sample video .h64 and got similar results (This image is directly from the playback video, because in this test, I’m not even saving the image or generating the BB)

Even so, I added a little bit of code:

if obj_meta.unique_component_id == 2 : # sgie
                obj_meta.rect_params.border_color.set(0.0, 0.0, 255.0, 1.0) #R,G,B,alpha
                print('Detecto:', obj_meta.class_id, '(', obj_meta.rect_params.width, obj_meta.rect_params.height, ')')

And got results like:

Detecto: 3 ( 5.684528827667236 26.32435417175293 )
Detecto: 0 ( 1.1010053157806396 6.656162738800049 )
Detecto: 1 ( 3.4143595695495605 6.315043926239014 )

So the BB are being returned really tiny from the model. It’s not a matter of the process of saving the file because I’m not doing that in this case, any other idea what could be?