Facial Landmarks in Python

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) NVIDIA GeForce RTX 4070 Laptop GPU
• DeepStream Version 7.0
• JetPack Version (valid for Jetson only)
• TensorRT Version 8.6.1.6-1+cuda12.0
• NVIDIA GPU Driver Version (valid for GPU only) 535.183.06
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I am implementing a facial landmarks app in Python based on C deepstream app. The pgie is the facenet model and the sgie is the faciallandmark (all models from the deepstream_tao_apps).

Follow the pipeline:
source.link(caps_v4l2src)
caps_v4l2src.link(vidconvsrc)
vidconvsrc.link(nvvidconvsrc)
nvvidconvsrc.link(caps_vidconvsrc)
sinkpad = streammux.request_pad_simple(“sink_0”)
if not sinkpad:
sys.stderr.write(" Unable to get the sink pad of streammux \n")
srcpad = caps_vidconvsrc.get_static_pad(“src”)
if not srcpad:
sys.stderr.write(" Unable to get source pad of caps_vidconvsrc \n")
srcpad.link(sinkpad)
streammux.link(pgie)
pgie.link(queue3)
queue3.link(sgie)
sgie.link(queue4)
queue4.link(tiler)
tiler.link(queue5)
queue5.link(nvvidconv)
nvvidconv.link(queue6)
queue6.link(nvosd)
nvosd.link(queue7)
queue7.link(sink)

In the “tile_sink_pad_buffer_probe” I can extract the landmarks possitions (x, y) from the softargmax layer. After that a draw the first 16 marks using “disp_meta.circle_params”. Although the marks are inside of the detected bounding box, the plotted marks are out of acale (shape of the face too small).

Question:

  1. Is it need to perform any preprocessing in the detected face box (pgie) to feed in the faciallandmark (sgie), since the input dimension of the sgie is 80x80?
  2. Is it need to apply any scale to the mark values?

Thank you in advance for any help!

The c/c++ sample is totally open source. Please compare your implementation with the sample. deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app/deepstream_faciallandmark_app.cpp at release/tao5.3_ds7.0ga · NVIDIA-AI-IOT/deepstream_tao_apps

@Fiona.Chen, thank you for your reply. The issue related to the landmarks out of the scale was solved by re-scaling the mark poits as follow:

obj_width = obj_meta.rect_params.width
obj_height = obj_meta.rect_params.height
rate_w = obj_width/80
rate_h = obj_height/80
x_mark = x_markrate_w
y_mark = y_mark
rate_h

Another question:
The facial landmarks app (original “c” code app) was designed to perform inference with just one face in the frame? Or can be used to infer multiple faces? I am asking this question because the marks when having just one face is very well. But, with more then one face the result is not ok:
Result with one face [ok]

Result with multiple faces [not ok]

I have tested multiple faces in many conditions and the result is always bad. Am I forgetting to perform any configuration in the application?

The models used in the sample are just some pre-trained models. You can re-train the models to make the model more accurate.

Overview - NVIDIA Docs

Thank you for reply.
What is the relationship between one face and multiple face? This means that the model was trained with just one face and it need to be retrained with multiple faces? Because all results with just one face are ok.

@Fiona.Chen, please, try to run the application with a video with more then one face to reproduce the issue.

There is face detection model before faciallandmarks model.

The faciallandmarks model handles single face only. The face detection model can detect faces from the multiple face image. The single face crop will be the input of the faciallandmarks model.

The models used in the sample are just some pre-trained models. You can re-train the models to make the model more accurate.

Overview - NVIDIA Docs

I wold like to give more context about the issue: I am using a camera. And, when just one person appear in the scene (for example, me) the landmarks are totally ok. But if someone else enters (join) in the camera field of view (for example, me and my wife) the landmarks become bad (like random poins). If I leave the camera field of view and just my wife remains in the scene, the landmarks become very ok. Because of that, I don`t think the issue is related to model re-training.
Could you please try to reproduce the issue by using a camera of a video file with two people in the scene?
I appreciate your helpe.