What is an efficient way to detect people with faces?

Hi All,

I have an app that uses peoplenet model to detect people and generate some metadata about those objects.
The model is also capable of detecting faces.
Now I want the app to generate metadata only for objects that represent people IF their face was detected as well (meaning no metadata would be generated for a person’s back).

What is the best way of doing it apart from going through all detected objects in the frame and trying to check if a face’s detection box is inside a person’s one or something?

Are you using deepstream to test? how many models in your pipeline? could you share your media pipeline?

Yes, the app uses DeepStream and is based on deepstream-app so the actual pipeline is hidden behind call like this:

for (i = 0; i < num_instances; i++) {
    if (!create_pipeline(appCtx[i], process_detected_objects,
        nullptr, perf_cb, overlay_graphics)) {
            NVGSTDS_ERR_MSG_V("Failed to create pipeline");
            return_value = -1;
            should_goto_done = true;
            break;
    }

It is a simple configuration as far as I can tell: converter->muxer->nvinfer->tracker but I’m new to DS and don’t know how to generate a pipeline description in this case, could you advise me on that?
Here is the relevant part of the config file:

[source0]
enable=1
type=2
uri=file://../samples/videoplayback.mp4
camera-id=0

[streammux]
batch-size=1
live-source=0
batched-push-timeout=40000
width=1920
height=1080
enable-padding=0

[primary-gie]
enable=1
config-file=infer_primary_peoplenet.cfg
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1

[tracker]
enable=1
ll-lib-file=../lib/libnvds_nvmultiobjecttracker.so
ll-config-file=tracker_NvDCF_perf.yml
tracker-width=640
tracker-height=384
enable-batch-process=1
enable-past-frame=0
display-tracking-id=1

[nvds-analytics]
enable=0
config-file=analytics.cfg

[sink0]
enable=1
type=2
sync=0

And here’s the PGIE config:

[property]
gie-unique-id=1
model-engine-file=peoplenet/resnet34_peoplenet_pruned_int8.etlt_b1_gpu0_int8.engine
int8-calib-file=peoplenet/resnet34_peoplenet_pruned_int8_dla.txt ## only for INT8
tlt-encoded-model=peoplenet/resnet34_peoplenet_pruned_int8.etlt
tlt-model-key=tlt_encode
uff-input-blob-name=input_1
infer-dims=3;544;960
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid
num-detected-classes=3
cluster-mode=1
net-scale-factor=0.0039215697906911373
network-mode=1 ## 0=FP32, 1=INT8, 2=FP16 mode
model-color-format=0
labelfile-path=peoplenet/labels.txt

[class-attrs-all]
pre-cluster-threshold=0.7
eps=0.7
minBoxes=1

[class-attrs-0]
pre-cluster-threshold=0.7
eps=0.5
group-threshold=3

[class-attrs-1]
pre-cluster-threshold=1.1
eps=0.5

[class-attrs-2]
pre-cluster-threshold=0.7
eps=0.5
group-threshold=3

As you said, please loop all object meta, then remove the meta that dose not include face.

Hi, I noticed you mentioned “remove the meta”. How could you do that because I tried to make sgie only work on NEW DETECTED object after pgie some time earlier by using GST_PAD_PROBE_DROP instead of GST_PAD_PROBE_OK. But this seems not working as I expect. Could you share with me?

please refer to nvds_remove_obj_meta_from_frame in /opt/nvidia/deepstream/deepstream-6.0/sources/includes/nvdsmeta.h, it will removes an object meta from the frame meta to which it is attached.

I implemented this approach by configuring PGIE to filter out unnecessary objects and then try matching a person’s detection box with a face’s box. It works.

Just a thought - if my model is able to detect people and faces, would it be more efficient to configure PGIE so it only detects persons and then somehow run it again so it only works with people’s objects (not whole frames) to detect faces? And how could I achieve that?

Actually, the approach I described is far from ideal as it creates a lot of false positives like this


or this

or this

Technically a face detected is within a person’s detector_bbox but it does not quarantee they belong to the same person.

So far my attempts to use secondary gie as a classifier with the same model didn’t bring any joy and I’m still looking for a better solution.

no matter one model or two models, you need to add this key logic: if can’t find face in people’s rectangle, remove this person’s meta.
for example, if you only get person in one frame, remove the person’s meta, if you get persons and faces’s meta in one frame, find face in person’s rectangle, if can’t find, remove the person’s meta.

Well, that’s exactly what I do.
The problem is that approach is prone to many false positives.
If you look at the first picture, the model detected a lady in a white top as a person. It also detected a face of a guy in sunglasses. As the face’s bbox is inside the person’s bbox, my code decides it’s a valid person object, but it’s not because the face does not belong to that person, we don’t see the lady’s face and won’t be able to identify her.
I believe it’s the same story with the 3rd picture as we have the lady’s face within the man’s bbox.
As for the 2nd picture, I think the detector detected the man’s face BUT it’s useless as we cannot properly identify him using only that picture because we can see no his eyes or other bits so ideally it should be ignored.
Hope it makes the issue clear.

About the first picture, need do more checking, for example, check if the face box is middle and upper of body box.
About the third picture, please refer to face landmark model, Facial Landmarks Estimation | NVIDIA NGC, here is some demos:deepstream_tao_apps/apps/tao_others/deepstream-emotion-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub