I want to put a stream who is a concatenation (using concat) of the bounding boxes cropped (using videobox) and upscaled (using videoscale) as an input to the second detector(sgie) in the pipeline so it would look like this (first image attached) . I found the coordinates of bboxes “obj_meta->rect_params” in the pgie_pad_buffer_probe. But the problem is, how can I retrieve them to apply the videobox + videoscale transformations and then put these videoboxes inside of the pipeline ?
• Hardware (T4/V100/Xavier/Nano/etc) : Jetson AGX Orin
• Network Type : FPEnet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
We want a solution that detects wether people filmed by the camera are speaking or not. But when testing the faciallandmarks app, we observed that when the faces were at a distance of 2.50 meters and more, when people were talking their facial landmarks around the mouth were not moving. But when people were near the camera (>2.50 meters), the detection was totally fine.
So the idea is to put a stream after the bbox detection with only the cropped and upscaled bounding boxes as an input for the landmarks, in order to have a better detection of the points around the mouth when people are more than 2.50 meters away from the camera.
Okay so it is not the cropped objects which must be scaled but the whole frame. Just to confirm, the nvvidoconvert element you are talking about and that i sould modify is this one ? GstElement *nvvidconv; And would this scaling of the whole frame give the same result as if we change the MUXER_OUTPUT_WIDTH, HEIGHT and tiler_rows, columns ?
By the way, in the case of dynamic cropping and upscaling of the bounding boxes before feeding them to the sgie, can I use nvvidconv or should I use nvdspreprocess to perform it successfully ?
There are many objects in the video frame, and it is not possible to scale a smaller object alone.
So, I think scaling the whole frame before inference might solve your problem.
You can do the scaling directly after decoding, adding nvvideoconvert before nvstreamux instead of the existing one in your code.
MUXER_OUTPUT_WIDTH, HEIGHT have nothing to do with the above, they are parameters of nvstreammux
Okay so I should scale right before inference. This nvvidconv element is just after decoding and just before streammux. Therefore here I added the lines
g_object_set(G_OBJECT(ds_source_struct>nvvidconv),“src-crop”,“0:0:1920:1080”, NULL);
g_object_set(G_OBJECT(ds_source_struct>nvvidconv),“dest-crop”,“0:0:3840:2160”, NULL);
and when I change those parameters I see some changes on the output video. But I am not sure to understand how it would solve our problem, could you give examples of parameters which may help ?
Stretch the ROI to the entire image, and the bbox of Object will become larger.
As you described, if the bbox of the face is too small, the accuracy will decrease.
Sorry for the late reply. I added the lines here
g_object_set(G_OBJECT(ds_source_struct->nvvidconv), “src-crop”, “0:0:3840:2160”, NULL);
g_object_set(G_OBJECT(ds_source_struct->nvvidconv), “dest-crop”, “0:0:3840:2160”, NULL);
(the stream is in 4k) and the points around the mouth were still not moving within a ditance of 4 meters. The problem persists. The crop is applied because when I change the parameters, the output is also changed.
But when testing the app with faces from 3 to 7 meters away, we noticed that when we performed a camera zoom manually on the face, the landmarks were detected and seeing the points moving we could identify if the person was talking or not. What do you recommend ?
This may focus more on a zone but wouldn’t the fact that the nvvidconv is placed before the bbox detection (so the crop is invariable) make this method work only if we already know when the faces are located on the frame ?
If you care about the entire 4k area and not just the ROI, it may not be possible to zoom in further due to hardware limitations.
What value do you set the width/height of nvstreammux? nvstreammux will scale the frame when forming a batch. You can set the value of the nvstreammux width/height property to 3840x2160,
This will also affect the bbox size