What is an efficient way to detect people with faces?

Hi All,

I have an app that uses peoplenet model to detect people and generate some metadata about those objects.
The model is also capable of detecting faces.
Now I want the app to generate metadata only for objects that represent people IF their face was detected as well (meaning no metadata would be generated for a person’s back).

What is the best way of doing it apart from going through all detected objects in the frame and trying to check if a face’s detection box is inside a person’s one or something?

Are you using deepstream to test? how many models in your pipeline? could you share your media pipeline?

Yes, the app uses DeepStream and is based on deepstream-app so the actual pipeline is hidden behind call like this:

for (i = 0; i < num_instances; i++) {
    if (!create_pipeline(appCtx[i], process_detected_objects,
        nullptr, perf_cb, overlay_graphics)) {
            NVGSTDS_ERR_MSG_V("Failed to create pipeline");
            return_value = -1;
            should_goto_done = true;
            break;
    }

It is a simple configuration as far as I can tell: converter->muxer->nvinfer->tracker but I’m new to DS and don’t know how to generate a pipeline description in this case, could you advise me on that?
Here is the relevant part of the config file:

[source0]
enable=1
type=2
uri=file://../samples/videoplayback.mp4
camera-id=0

[streammux]
batch-size=1
live-source=0
batched-push-timeout=40000
width=1920
height=1080
enable-padding=0

[primary-gie]
enable=1
config-file=infer_primary_peoplenet.cfg
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1

[tracker]
enable=1
ll-lib-file=../lib/libnvds_nvmultiobjecttracker.so
ll-config-file=tracker_NvDCF_perf.yml
tracker-width=640
tracker-height=384
enable-batch-process=1
enable-past-frame=0
display-tracking-id=1

[nvds-analytics]
enable=0
config-file=analytics.cfg

[sink0]
enable=1
type=2
sync=0

And here’s the PGIE config:

[property]
gie-unique-id=1
model-engine-file=peoplenet/resnet34_peoplenet_pruned_int8.etlt_b1_gpu0_int8.engine
int8-calib-file=peoplenet/resnet34_peoplenet_pruned_int8_dla.txt ## only for INT8
tlt-encoded-model=peoplenet/resnet34_peoplenet_pruned_int8.etlt
tlt-model-key=tlt_encode
uff-input-blob-name=input_1
infer-dims=3;544;960
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid
num-detected-classes=3
cluster-mode=1
net-scale-factor=0.0039215697906911373
network-mode=1 ## 0=FP32, 1=INT8, 2=FP16 mode
model-color-format=0
labelfile-path=peoplenet/labels.txt

[class-attrs-all]
pre-cluster-threshold=0.7
eps=0.7
minBoxes=1

[class-attrs-0]
pre-cluster-threshold=0.7
eps=0.5
group-threshold=3

[class-attrs-1]
pre-cluster-threshold=1.1
eps=0.5

[class-attrs-2]
pre-cluster-threshold=0.7
eps=0.5
group-threshold=3

As you said, please loop all object meta, then remove the meta that dose not include face.

Hi, I noticed you mentioned “remove the meta”. How could you do that because I tried to make sgie only work on NEW DETECTED object after pgie some time earlier by using GST_PAD_PROBE_DROP instead of GST_PAD_PROBE_OK. But this seems not working as I expect. Could you share with me?

please refer to nvds_remove_obj_meta_from_frame in /opt/nvidia/deepstream/deepstream-6.0/sources/includes/nvdsmeta.h, it will removes an object meta from the frame meta to which it is attached.

I implemented this approach by configuring PGIE to filter out unnecessary objects and then try matching a person’s detection box with a face’s box. It works.

Just a thought - if my model is able to detect people and faces, would it be more efficient to configure PGIE so it only detects persons and then somehow run it again so it only works with people’s objects (not whole frames) to detect faces? And how could I achieve that?

Actually, the approach I described is far from ideal as it creates a lot of false positives like this


or this

or this

Technically a face detected is within a person’s detector_bbox but it does not quarantee they belong to the same person.

So far my attempts to use secondary gie as a classifier with the same model didn’t bring any joy and I’m still looking for a better solution.

no matter one model or two models, you need to add this key logic: if can’t find face in people’s rectangle, remove this person’s meta.
for example, if you only get person in one frame, remove the person’s meta, if you get persons and faces’s meta in one frame, find face in person’s rectangle, if can’t find, remove the person’s meta.

Well, that’s exactly what I do.
The problem is that approach is prone to many false positives.
If you look at the first picture, the model detected a lady in a white top as a person. It also detected a face of a guy in sunglasses. As the face’s bbox is inside the person’s bbox, my code decides it’s a valid person object, but it’s not because the face does not belong to that person, we don’t see the lady’s face and won’t be able to identify her.
I believe it’s the same story with the 3rd picture as we have the lady’s face within the man’s bbox.
As for the 2nd picture, I think the detector detected the man’s face BUT it’s useless as we cannot properly identify him using only that picture because we can see no his eyes or other bits so ideally it should be ignored.
Hope it makes the issue clear.

About the first picture, need do more checking, for example, check if the face box is middle and upper of body box.
About the third picture, please refer to face landmark model, Facial Landmarks Estimation | NVIDIA NGC, here is some demos:deepstream_tao_apps/apps/tao_others/deepstream-emotion-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

Thanks for your help.
I’m working on additional checks regarding face’s position inside person’s box.
I had a look at the facial landmark example (think it’s more suitable than emotion detection) and have some questions.
First of all, I actually do need some facial landmarks for further processing but I only need basic things like dots for eyes, nose and mouth (lips) for adjusting its orientation.
The model you referred me to is too detailed and unnecessarily slows down my application (I run it on Jetson Xavier NX).
I was hoping to download 68-point model but despite mentioning in its description that 68, 80 and 104-point models are available - the only model that is in the archive it deployable 80-point model (and not trainable ones) - do you know where I can get any of these?
Ideally I need to get something like this (projects 1 and 2), but don’t know if (and how) it is possible to convert their models to something that is acceptable by TensorRT - could you advise me on this?

And one more thing about using PGIE and SGIE that I don’t yet understand.
My application is based on deepstream-transfer-learning and everything is basically hidden inside deepstream-app structures.
When I added a SGIE for facial landmarks detection, I had to attach a function that post-processes its results and generates some metadata:

if (appCtx[i]->pipeline.common_elements.secondary_gie_bin.bin) {
    osd_sink_pad = gst_element_get_static_pad (appCtx[i]->pipeline.common_elements.secondary_gie_bin.bin, "src");
    if (!osd_sink_pad)
        NVGSTDS_ERR_MSG_V ("Unable to get SGIE src pad\n");
    else {
        gst_pad_add_probe (osd_sink_pad, GST_PAD_PROBE_TYPE_BUFFER,
            sgie_pad_buffer_probe, NULL, NULL);
        gst_object_unref (osd_sink_pad);
    }
}

I have a function that filters out inference results and when I only used PGIE, it was registered like this:

if (!create_pipeline(appCtx[i], process_batch,
    nullptr, perf_cb, overlay_graphics)) {
    NVGSTDS_ERR_MSG_V("Instance %i: failed to create pipeline", i);
    return_value = -1;
    should_goto_done = TRUE;
    break;
}

and it worked fine.

With SGIE added it no longer works as expected.
For example, during the filtering process I sometimes notice that frame_meta->num_obj_meta is increasing as new objects are being added (and my app is crashing because it calls nvds_obj_enc_process that depends on indexes).
If I register my function like this:

create_pipeline(appCtx[i], nullptr,
    process_batch, perf_cb, overlay_graphics)

it does not see any input data it used to see.

Looks like there is some sort of synchronisation that is needed to know when SGIE is done and it’s safe to do post-processing of frame’s metadata, and I don’t know how to achieve that.
I also don’t quite understand why we have these two options where to add callback function I illustrated above, could you give me an idea?

About “do you know where I can get any of these?”, there is only one FPENet model, you can modify configs/facial_tao/sample_faciallandmarks_config.txt to control output numbers.

About “could you advise me on this?”, need to check if the model is supported, please refer to Gst-nvinfer — DeepStream 6.1.1 Release documentation.

About “could you give me an idea?”, deepstream-test2 is more easier to understand and modify.

Sorry but deepstream-test2 is not based on deepstream-app and does not implement create_pipeline function I’m referring to in my previous question.

gboolean create_pipeline (AppCtx * appCtx,
bbox_generated_callback bbox_generated_post_analytics_cb,
bbox_generated_callback all_bbox_generated_cb…

bbox_generated_post_analytics_cb shall be triggered after analytics (PGIE, Tracker or the last SGIE appearing in the pipeline), please check function create_common_elements.
all_bbox_generated_cb is usually added to OSD or sink (in case OSD is disabled)., it is an opportunity to modify the processed metadata or do analytics.

Thanks for the explanation, I looked into the code and now understand the difference.
Still, I don’t understand why in my bbox_generated_post_analytics_cb number if objects in frame is changing, need to investigate it further.

Is this still an issue to support? Thanks

I think I found a temporary solution by filtering out faces based on statistical data about their positions and sizes inside valid persons’ bboxes.

Do you happen to know if it’s possible to train the existing model from NGC catalog I’m using (peoplenet) so it only detects people with their faces visible?
I take it that I need an unpruned model to do so but can only see deployable_quantized one if I click on Download - where can I get the unpruned one?
And can I perform such a training on Jetson Xavier NX?

Do you have any deepstream issue? About model training, please open a new topic in TAO forum. TAO Toolkit - NVIDIA Developer Forums

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks