Very small Bounding Boxes with custom sgie model

Yes, as a matter of fact, adding this line to my code:
print('{}: left:{}, top:{}, width:{}, height:{}'.format(obj_name, left, top, width, height))
Will produce these lines in the output:

Persona 3: left:780, top:143, width:99, height:295
ojos_sin_proteccion: left:870, top:222, width:4, height:7

So for the person detection (pgie) the blue bounding box is accurate (295x99). But for sgie (“ojos_sin_proteccion”) detection is wrong and size is extremely small (7x4 pixels).

Although detections made by tlt-evaluate using frames taken from the same video, were accurate (in size and position). This is why I think the error is either in the trainig ize or the input resolution, but I don’t know the appropiate valuer or where to configure each.

Hi @monita.ramirezb ,
sgie receives the pgie object metadata from pgie and crops the object image based on the BBOX info in pgie object metadata, and then scale to the resolution of the input of sgie network. You can’t configure the input resolution for sgie.

And, I think your case is similar as sample - GitHub - NVIDIA-AI-IOT/deepstream_lpr_app: Sample app code for LPR deployment on DeepStream</title , it does card detetion (Car detection model), and then do car licence plate detection (LPD model), and then do car license plate recognization (LPR model). In your case, you don’t have the LPR model.

You can refer to deepstream_lpr_app/deepstream_lpr_app.c at master · NVIDIA-AI-IOT/deepstream_lpr_app · GitHub to add a probe on the sink pad of the OSD plugin, and leave OSD to draw the BBOX.

A couple of concerns:

I don’t understand which atribute from the pgie config-file I should modify. According to the documentation and the LPR example, the only bbox attributes are: input-object-min-width/height and input-object-max-width/height on sgie config-file. I’ve played a little with them with no change in the detections (tiny bb) or no detections at all! (even when a person is clearly in the frame): Pgie is detecting people just fine, passing them to sgie, but its the sgie that’s detecting incorrectly.

So, if it’s not the resolution of the input image, what can it be? Why is the detection of the mode working fine on images, but failing so much on stream video?
Thanks in advance!

How did you get this conclusion?

It’s my guess… because when I trained the model using TLT and ran tlt-infer on my test images (that were, actually, still frames from the test video) the labels and labeled images had almost perfect detections!
But when I deploy it to deepstream, and ran the full sample video, the tiny bounding boxes come up.
What else could be going wrong?

from the piece of code, it’s hard to say what the root cause is.
Could you apply below change to dump the raw output of the sgie and parse the output offline to check its BBOX side.

diff --git a/libs/nvdsinfer/nvdsinfer_context_impl.cpp b/libs/nvdsinfer/nvdsinfer_context_impl.cpp
index 73485e0..827001c 100644
--- a/libs/nvdsinfer/nvdsinfer_context_impl.cpp
+++ b/libs/nvdsinfer/nvdsinfer_context_impl.cpp
@@ -23,6 +23,7 @@
 #include <NvInferPlugin.h>
 #include <NvUffParser.h>
 #include <NvOnnxParser.h>
+#include <opencv2/imgcodecs.hpp>

 #include "nvdsinfer_context_impl.h"
 #include "nvdsinfer_conversion.h"
@@ -528,6 +529,17 @@ InferPostprocessor::copyBuffersToHostMemory(NvDsInferBatch& batch, CudaStream& m
                         batch.m_BatchSize,
                     cudaMemcpyDeviceToHost, mainStream),
                 "postprocessing cudaMemcpyAsync for output buffers failed");
+           {
+               cudaStreamSynchronize(mainStream);
+
+               std::string filename =
+                       "gie-" + std::to_string(m_UniqueID) +
+                       "output-layer-index-" + std::to_string(i);
+               std::ofstream dump_file(filename, std::ios::binary);
+               dump_file.write((char *)batch.m_HostBuffers[info.bindingIndex]->ptr(),
+                       getElementSize(info.dataType) * info.inferDims.numElements *
+                       batch.m_BatchSize);
+           }
         }
         else if (needInputCopy())
         {

Sorry, not a good C++ user. My code is in Python.
But, maybe you can describe what the code does to see if I can replicate it on Python?
Thanks in advance!

This code is to be added in nvinfer libs, nvinfer only supports C++, so no need to change to python.

HOW to apply the code:

# cd /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/
==> apply above change to nvdsinfer_context_impl.cpp 
# export CUDA_VER=10.2
# make
# cp /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_infer.so   ~/libnvds_infer.so.bak
# cp libnvds_infer.so /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_infer.so

then run your app, it will dump the raw output of the TRT infer for your mdoels into a binary file, then you can parse the file offline to check if the BBOX is really tiny.

Thanks for the step by step explanation! :)
I did the changes you mention on nvdsinfer_context_impl.cpp and my app ran without errors. But I can’t seem to find where the binary file you mention is located.

It’s not in the same folder as my app, nor in the nvdsinfer folder. (Even so, find -name gie-* returns no results)
Am I understanding wrong?

Not sure what the problem is …

I verified the steps again, it can generate gie-1output-layer-index-* in the folder you run the command.

Ok, so getting the image this way is not working for me, but I this this other test:
I took deepstream_test_2.py and only changed the first sgie config file for mine. I tested it with a sample video .h64 and got similar results (This image is directly from the playback video, because in this test, I’m not even saving the image or generating the BB)

Even so, I added a little bit of code:

if obj_meta.unique_component_id == 2 : # sgie
                obj_meta.rect_params.border_color.set(0.0, 0.0, 255.0, 1.0) #R,G,B,alpha
                print('Detecto:', obj_meta.class_id, '(', obj_meta.rect_params.width, obj_meta.rect_params.height, ')')

And got results like:

Detecto: 3 ( 5.684528827667236 26.32435417175293 )
Detecto: 0 ( 1.1010053157806396 6.656162738800049 )
Detecto: 1 ( 3.4143595695495605 6.315043926239014 )

So the BB are being returned really tiny from the model. It’s not a matter of the process of saving the file because I’m not doing that in this case, any other idea what could be?

Hi,
Any udpate on this subject?

Sorry for late response! I didn’t see tiny BBOX on your screenshot, could you point out the tiny bbox?

Thanks!

You’re right, they’re so small you can barely see them!
Here I made them white, thick and removed the labels. You can see them in the left border of the bounding box most of the times:


Sorry! Still don’t see them. Could you draw circles on the tiny BBOXs to highlight them?

Sure, here you can see they are really tiny WHITE BB:


Update:
I trained another model, using images of size: 960x540. (And changed my spec files as well)
This model also generates good results when running tlt-infer on my test images, but in this case it’s detecting nothing in the video when converted and deployed in the Jetson. (I have tried with different videos and the same result)

So everything keeps pointing to the size of the training images, or the network, but still can´t find what it is.
Is there a relationship or limits I should know about?

Hi,
Any udpate on this subject?

Sorry for long delay!

Is it possible to share a repo?

Sure, all my code is here:

Please note that the full app contains other functionalities like saving images to AWS, save in a Database, but all tests and aids that you’ve given me have been tested apart to discard the error is related to them.
Also, executing .py script as it is, will lead to errors, since AWS credentials have been deleted in the repository.
If you need any further explanation on any step, feel free to ask away!