Is DeepStream suitable for face detection using MTCNN?

hotribao · June 19, 2021, 10:56am

Hi,

I am very new to DeepStream. I am seeking for advice. Could you kindly help?

I want to implement a face detection application using MTCNN model with DeepStream running on Jetson Nano. It requires cascading 3 different models: PNet => RNet => ONet. That can be done by chaining the primary GIE with secondary GIEs. The primary GIE will be using PNet model.

However, according to MTCNN algorithm, the input of PNet model must be an image pyramid. That is: each frame of the stream will scaled down to different sizes. Then each scaled image will be fed to PNet. The result bounding boxes will be converted to the original image coordinate. Then doing NMS. Repeating that for all images in the pyramid and gathering all bounding boxes. The collection of bounding boxes will be passed to the NMS one more time.

How could I implement that with DeepStream for the primary GIE?

From what I know about DeepStream, the entire frame dimension will be passed to nvinfer at PGIE by the streammuxer. There seems to be no way to inject custom code in order to build the image pyramid. The only place that I can do customization to nvinfer is at the bounding box parser function. But assuming the pyramid can be built, at the bounding box parser function, I will not know what is the scale factor of the image being inferred in order to convert the bounding boxes into the original image coordinate.

Thanks for your help

mchi · June 27, 2021, 7:53am

Hi @hotribao ,
Sorry for long delay!

I found topics below which can run mtcnn by TRT, seems they both do not mention image pyramid. Coul you share more details about image pyramid?

github.com

PKUZHOU/MTCNN_FaceDetection_TensorRT/blob/master/src/mtcnn.cpp

#include "mtcnn.h"
//#define LOG
mtcnn::mtcnn(int row, int col){
    //set NMS thresholds
    nms_threshold[0] = 0.7;
    nms_threshold[1] = 0.7;
    nms_threshold[2] = 0.7;
    //set minimal face size (weidth in pixels)
    int minsize = 60;
    /*config  the pyramids */
    float minl = row<col?row:col;
    int MIN_DET_SIZE = 12;
    float m = (float)MIN_DET_SIZE/minsize;
    minl *= m;
    float factor = 0.709;
    int factor_count = 0;
    while(minl>MIN_DET_SIZE){
        if(factor_count>0)m = m*factor;
        scales_.push_back(m);
        minl *= factor;

This file has been truncated. show original

Thanks!

hotribao · June 28, 2021, 6:41am

hi @mchi ,

Thanks a lot for your response. All responses are valuable to me as I am just a new DeepStream learner and have been scratching my head to find a way to fit my application into DeepStreams’ paradigm and still couldn’t find a way out…

Before creating this topic, I searched a lot on this forum for all posts related to DeepStream+MTCNN but couldn’t see anyone who claimed to successfully implement this combination. I saw the post you mentioned as well however it is a standalone TRT application. Instead of writing a standalone TRT application, I would like to use DeepStream in order to utilize all of its accelerators, not just only TRT.

In the source code that you quoted, it does mention about the pyramids. There, it calculates the list of scale factors to be applied to the input image, stored into the vector scales_

   /*config  the pyramids */
    float minl = row<col?row:col;
    int MIN_DET_SIZE = 12;
    float m = (float)MIN_DET_SIZE/minsize;
    minl *= m;
    float factor = 0.709;
    int factor_count = 0;
    while(minl>MIN_DET_SIZE){
        if(factor_count>0)m = m*factor;
        scales_.push_back(m);
        minl *= factor;
        factor_count++;
    }

With an 640x480 image, there will be 7 scales.

Then coming down a bit, for each scale, it prepares/generates a TRT engine of PNet model which is dedicated for the given input shape. This is needed when running in TRT, but in the original algorithm, this step is not needed.

https://github.com/PKUZHOU/MTCNN_FaceDetection_TensorRT/blob/dfad60565216a68413f434b500168c456fdd2587/src/mtcnn.cpp#L41

    //generate pnet models
    pnet_engine = new Pnet_engine[scales_.size()];
    simpleFace_ = (Pnet**)malloc(sizeof(Pnet*)*scales_.size());
    for (size_t i = 0; i < scales_.size(); i++) {
        int changedH = (int)ceil(row*scales_.at(i));
        int changedW = (int)ceil(col*scales_.at(i));
        pnet_engine[i].init(changedH,changedW);
        simpleFace_[i] =  new Pnet(changedH,changedW,pnet_engine[i]);
    }

Next, in the function “findFace”, we can see the input image is scaled then fed to the PNet model

https://github.com/PKUZHOU/MTCNN_FaceDetection_TensorRT/blob/master/src/mtcnn.cpp#L69

    for (size_t i = 0; i < scales_.size(); i++) {
        int changedH = (int)ceil(image.rows*scales_.at(i));
        int changedW = (int)ceil(image.cols*scales_.at(i));
        clock_t run_first_time = clock();
        resize(image, reImage, Size(changedW, changedH), 0, 0, cv::INTER_LINEAR);
        (*simpleFace_[i]).run(reImage, scales_.at(i),pnet_engine[i]);

Now, come my first obstacle: the Primary GIE in DeepStream only accept one image. But this algorithm requires that the input image is scaled into multiple sizes (down scale).

In the post you mentioned, the one who implemented it in a standalone TRT application takes another approach: he scales the input image into multiple smaller sizes then stack all of them into one big image and feed it into the PNet network. In whatever approach, with MTCNN, it always requires image pre-processing step which I couldn’t see DeepStream supports it.

In general, I know DeepStream supports custom bounding box parser function which is the post-processing phase. How about image pre-processing not only at the Primary GIE but also at the Secondary GIE? In face recognition application, a detected face box needs to be aligned using 5 facial landmarks. That will be the required pre-processing for an SGIE.

At the last line of the above quoted source code ( (*simpleFace_[i]).run(.... ), when going inside method “run” of class Pnet, we will see it calls method “generateBbox”. That method transforms bounding boxes found on the scaled image into the coordinate of the original input image

https://github.com/PKUZHOU/MTCNN_FaceDetection_TensorRT/blob/master/src/pnet_rt.cpp#L133

                bbox.x1 = round((stride * row + 1) / scale);
                bbox.y1 = round((stride * col + 1) / scale);
                bbox.x2 = round((stride * row + 1 + cellsize) / scale);
                bbox.y2 = round((stride * col + 1 + cellsize) / scale);

there comes my second obstacle: assuming the first obstacle solved, at the custom bounding box parser function, the scale being used is unknown.

mchi · June 28, 2021, 3:38pm

One quick question, is it possible for you to use TLT FaceDetectIR model NVIDIA DeepStream SDK Developer Guide — DeepStream 6.1.1 Release documentation

hotribao · June 28, 2021, 4:02pm

Yes. While waiting for response of this topic, I tried FaceDetect and it works. If MTCNN doesn’t work with DeapStream, I will have to use it instead. Just it doesn’t return 5 facial landmarks points like MTCN, they are used to do face alignment.

neo21995 · August 1, 2023, 10:53am

@hotribao are you able to run any MTCNN or RetinaFace for facial landmarks.

hotribao · August 1, 2023, 1:44pm

No, I was not

Topic		Replies	Views
DeepStream implementation of working nwesem/mtcnn_facenet_cpp_tensorRT needed DeepStream SDK	8	851	October 12, 2021
What is an efficient way to detect people with faces? DeepStream SDK	20	1958	June 20, 2022
Deepstream for face recognition DeepStream SDK	17	4531	October 12, 2021
SCRFD Bounding Boxes Misaligned in DeepStream 7.1 with Custom Parser DeepStream SDK python , deepstream	7	42	May 1, 2025
Create Deepstream plugin that encapsulates process of my ONNX model - where to start? DeepStream SDK	14	850	January 22, 2024
Cannot find the objectDetector_FastRCNN example DeepStream SDK deepstream	46	170	October 14, 2024
Object detection pre-trained model inference issue in deepstream DeepStream SDK tensorrt , jetson-inference , gstreamer , python	51	395	August 9, 2024
Issues with Face Recognition DeepStream SDK deepstream	19	115	April 29, 2025
DeepStream - Loading Custom Model DeepStream SDK	10	3645	October 12, 2018
Some question about Deep stream 5 DeepStream SDK	42	1784	October 12, 2021

Is DeepStream suitable for face detection using MTCNN?

Related topics