Retrieving bboxes coordinates outside of the buffer probe in deepstream_faciallandmark_app.cpp

msandjivy · October 10, 2024, 9:16am

Hello,

I want to put a stream who is a concatenation (using concat) of the bounding boxes cropped (using videobox) and upscaled (using videoscale) as an input to the second detector(sgie) in the pipeline so it would look like this (first image attached) . I found the coordinates of bboxes “obj_meta->rect_params” in the pgie_pad_buffer_probe. But the problem is, how can I retrieve them to apply the videobox + videoscale transformations and then put these videoboxes inside of the pipeline ?

• Hardware (T4/V100/Xavier/Nano/etc) : Jetson AGX Orin
• Network Type : FPEnet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Thank you,

Matéo

junshengy · October 10, 2024, 9:45am

Can you share why you did this?

All the deepstream plugins work on GPU or VIC. So this means that if you want to use videobox/videoscale, you need to copy the memory from GPU to CPU.
Why do you need to crop the object from video frame at upstream of sgie? Sgie can inference objects directly.

What kind of improvements do you want to make to this sample?

github.com

NVIDIA-AI-IOT/deepstream_tao_apps/blob/master/apps/tao_others/deepstream-faciallandmark-app/deepstream_faciallandmark_app.cpp

/*
 * Copyright (c) 2021-2024, NVIDIA CORPORATION. All rights reserved.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 * DEALINGS IN THE SOFTWARE.

This file has been truncated. show original

msandjivy · October 10, 2024, 9:55am

Hi,

We want a solution that detects wether people filmed by the camera are speaking or not. But when testing the faciallandmarks app, we observed that when the faces were at a distance of 2.50 meters and more, when people were talking their facial landmarks around the mouth were not moving. But when people were near the camera (>2.50 meters), the detection was totally fine.
So the idea is to put a stream after the bbox detection with only the cropped and upscaled bounding boxes as an input for the landmarks, in order to have a better detection of the points around the mouth when people are more than 2.50 meters away from the camera.

junshengy · October 10, 2024, 11:17am

Extracting and scaling the object may not achieve what you need.

You can try to scale the whole video frame using nvvideoconvert after decoding.(use the dest-crop and src-crop properties).

Improving the accuracy of the model may be a better way.

msandjivy · October 10, 2024, 12:26pm

Okay so it is not the cropped objects which must be scaled but the whole frame. Just to confirm, the nvvidoconvert element you are talking about and that i sould modify is this one ? GstElement *nvvidconv; And would this scaling of the whole frame give the same result as if we change the MUXER_OUTPUT_WIDTH, HEIGHT and tiler_rows, columns ?

Thank you

msandjivy · October 10, 2024, 8:54pm

By the way, in the case of dynamic cropping and upscaling of the bounding boxes before feeding them to the sgie, can I use nvvidconv or should I use nvdspreprocess to perform it successfully ?

Thank you

junshengy · October 11, 2024, 7:21am

There are many objects in the video frame, and it is not possible to scale a smaller object alone.
So, I think scaling the whole frame before inference might solve your problem.

You can do the scaling directly after decoding, adding nvvideoconvert before nvstreamux instead of the existing one in your code.

MUXER_OUTPUT_WIDTH, HEIGHT have nothing to do with the above, they are parameters of nvstreammux

junshengy · October 11, 2024, 7:25am

Using nvdspreprocess to scale may require you to modify the code of the nvdspreprocess plugin

msandjivy · October 11, 2024, 8:42am

Okay so I should scale right before inference. This nvvidconv element is just after decoding and just before streammux. Therefore here I added the lines
g_object_set(G_OBJECT(ds_source_struct>nvvidconv),“src-crop”,“0:0:1920:1080”, NULL);
g_object_set(G_OBJECT(ds_source_struct>nvvidconv),“dest-crop”,“0:0:3840:2160”, NULL);
and when I change those parameters I see some changes on the output video. But I am not sure to understand how it would solve our problem, could you give examples of parameters which may help ?

Thank you

msandjivy · October 11, 2024, 8:55am

If I understood, using nvdspreprocess, dynamic cropping is already suported but I should implement the scaling part ?

Thank you

junshengy · October 11, 2024, 10:43am

Stretch the ROI to the entire image, and the bbox of Object will become larger.
As you described, if the bbox of the face is too small, the accuracy will decrease.

msandjivy · October 14, 2024, 7:55am

Sorry for the late reply. I added the lines here
g_object_set(G_OBJECT(ds_source_struct->nvvidconv), “src-crop”, “0:0:3840:2160”, NULL);
g_object_set(G_OBJECT(ds_source_struct->nvvidconv), “dest-crop”, “0:0:3840:2160”, NULL);
(the stream is in 4k) and the points around the mouth were still not moving within a ditance of 4 meters. The problem persists. The crop is applied because when I change the parameters, the output is also changed.

But when testing the app with faces from 3 to 7 meters away, we noticed that when we performed a camera zoom manually on the face, the landmarks were detected and seeing the points moving we could identify if the person was talking or not. What do you recommend ?

Thank you

junshengy · October 14, 2024, 8:32am

I mean stretching the ROI to the entire frame. For example, the following code will stretch the ROI to 4k.

g_object_set(G_OBJECT(ds_source_struct->nvvidconv), “src-crop”, “960:540:1920:1080”, NULL);
g_object_set(G_OBJECT(ds_source_struct->nvvidconv), “dest-crop”, “0:0:3840:2160”, NULL);

Without cropping, this method should not be able to get the expected results.

msandjivy · October 15, 2024, 2:26pm

This may focus more on a zone but wouldn’t the fact that the nvvidconv is placed before the bbox detection (so the crop is invariable) make this method work only if we already know when the faces are located on the frame ?

Thank you

junshengy · October 16, 2024, 9:06am

If you care about the entire 4k area and not just the ROI, it may not be possible to zoom in further due to hardware limitations.

What value do you set the width/height of nvstreammux?
nvstreammux will scale the frame when forming a batch. You can set the value of the nvstreammux width/height property to 3840x2160,
This will also affect the bbox size

msandjivy · October 17, 2024, 2:11pm

Yes because in our cases the faces may be at any place in the image. I set the nvstreammux parameters to 3840*2160 already

junshengy · October 18, 2024, 11:09am

This may require a bit of trickery. Divide one frame into mutiple images form a batch.
Just like the following pipeline.

                       | --> nvvideocovert (src-crop/dest-crop)
uridecodebin --> tee -->                                         --> nvstreammux(form batch)..
                       | --> nvvideocovert (src-crop/dest-crop)
                       | --> nvvideocovert (src-crop/dest-crop)

This is a similar topic.

msandjivy · October 21, 2024, 9:33am

Okay thank you I will check about this

msandjivy · October 22, 2024, 11:05am

By the way, I cannot reach the page https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/master/apps/tao_others/deepstream-faciallandmark-app/ anymore, it seems Nvidia removed the faciallandmarks and gaze apps of the repository ?

junshengy · October 23, 2024, 3:44am

DS-7.1 base on TensorRT 10. Some TAO models lack support, so they have been removed from the DS-7.1 branch.

Please use the DS-7.0 branch.

Topic		Replies	Views
Scaling bounding boxes in second object detection model DeepStream SDK	13	762	May 14, 2024
'dsexample' plugin :: how to crop bounding box without scaling or maintaining aspect ratio DeepStream SDK	2	742	October 12, 2021
Preprocess objects before sgie DeepStream SDK	6	442	March 19, 2024
Rotate bounding box in DeepStream DeepStream SDK deepstream	13	301	June 10, 2025
Custom input for SGIE DeepStream SDK	8	1478	August 26, 2021
How to crop and resize different input sources separately? DeepStream SDK python , jetson , deepstream	11	1053	January 16, 2024
Is there any plugin could be used for cropping the bounding box and save it? DeepStream SDK	6	993	October 12, 2021
How can I use the nvdspreprocess plugin to rescale multiple streams in a streammux DeepStream SDK deepstream	24	460	March 7, 2025
Deepstream meta detector_bbox_info scaling DeepStream SDK gstreamer , deepstream	3	57	October 27, 2025
ROI for source video in deepstream-app DeepStream SDK	14	4924	October 12, 2021

Retrieving bboxes coordinates outside of the buffer probe in deepstream_faciallandmark_app.cpp

Related topics