Problems with the quality of the detector components (facenet, faciallandmarks, gazenet, emotionnet)

Hello
Device: Jetson Xavier NX
Firmware: JP5.1

I’m using a pipeline containing a sequence of emotion and gaze detectors (based on an emotion detector application where the gaze detector was added to the pipeline after the emotion detector).

A week ago, a colleague and I decided to test the operation of the detectors when there are 2 people (or more) in front of the camera.
Several problems have been found:

  1. Faciallandmarks do not accurately cover the oval of the face (when the background is of a uniform light color or when there are foreign objects on the background).
  2. Look direction vectors often do not correspond to the look direction. A change in the position of the iris is detected by the detector normally, but the direction and length of the gaze direction vectors live their own life.
  3. In the operation of the emotion detector, there is some “bounce” of the emotions being detected.

In the attached video file, one of my colleagues and I (I am wearing glasses, with a beard and long hair - stress test for detectors) are standing in front of a webcam, the second colleague was recording video from the monitor screen.

video

Time codes:
00:00…00:10 - head turns;
00:12…00:25 - motionless head, eye movements;
00:28…00:48 - turning the head while maintaining the direction of gaze into the camera;
00:49…00:57 - voluntary movements of the head and eyes.

Hi,

Please share the source you are using with us first.
Thanks.

What source do you need?
I attached the original video in the topic header.

Hello
In the attached archive, the source code, a test video and a few comments in the readme file.
Thank in advance.
detectors_test.zip (5.2 MB)

Hi,

These models can be found in our NGC website.

FaceDetect: FaceDetect | NVIDIA NGC

GazeNet: Gaze Estimation | NVIDIA NGC

Facial Landmark: Facial Landmarks Estimation | NVIDIA NGC

EmotionNet: EmotionNet | NVIDIA NGC

Please try the latest release.
If it doesn’t meet your requirement, you can fine-tune it with the TAO toolkit.

Thanks.

Hello!
Thank you for your reply and for recommending the TAO toolkit. We will definitely use it in the future to adapt the model to our data.
However, now we would like to reproduce the accuracy of face recognition and gaze direction on existing models, for example, on NVIDIA video examples or on our simple examples.
Based on this stage, we will plan further work.
So I would like to know how to tweak the behavior of already trained models to get a level of accuracy comparable to the demo videos. For example, through configuration files located in ‘/deepstream_tao_apps/configs’?

My specific questions:

  1. How can I improve the accuracy of the Faciallandmarks (SGIE) component for delineating the oval of the face? Now we see that the facial landmarks are not aligned with the face oval.
  2. How can I change the behavior of the GazeNet component (SGIE) to accurately register the gaze direction? Now, when a person either turns his head, the determined direction of gaze is erroneously shifted following the turn of the head. And when a person shifts his eyes without turning his head, the determined direction of gaze does not mistakenly change.
  3. In the list of models for FaceNet, I found the deployable_v1.0 model, which has a higher accuracy. Do you recommend trying it to improve the accuracy of FaceNet (PGIE)?
Screencast video

config_infer_primary_facenet.txt

#...

[class-attrs-all]
#pre-cluster-threshold=0.2
pre-cluster-threshold=0.6
group-threshold=1

## Set eps=0.7 and minBoxes for cluster-mode=1(DBSCAN)
#eps=0.2
##minBoxes=3

cluster-mode=1			
eps=0.7
minBoxes=3

Thanks in advance.

Hello
Can you please help me with my questions in my post above?

Hi,

Sorry for the late update.
Since your question is specified to the TAO model, we will redirect your topic to the TAO team.

Thanks.

Suggest you to run gazenet and Facial Landmark network separately against your test video.
For Facenet, please use the best model you mentioned previously.
For Facial Landmark and Gazenet, please follow deepstream_tao_apps/apps/tao_others/README.md at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub to setup environment and run them separately.

Start to run the facial landmark application

    cd deepstream-faciallandmark-app
    ./deepstream-faciallandmark-app [1:file sink|2:fakesink|3:display sink]  \
    <faciallandmark model config file> <input uri> ... <input uri> <out filename>
OR
    ./deepstream-faciallandmark-app <app YAML config file>


Start to run the Gazenet

    cd deepstream-gaze-app
    ./deepstream-gaze-app [1:file sink|2:fakesink|3:display sink]  \
    <faciallandmark model config> <input uri> ... <input uri> <out filename>
OR
    ./deepstream-gaze-app <app YAML config file>

Hello.
Thanks for the answer!
I will make test records separately for each of the detectors: Faciallandmarks and GazeNet in the near future.
Unfortunately, I will make recordings at home, in the background on the wall there is wallpaper with patterns. I hope that they will not interfere with the operation of the Faciallandmarks detector.
The videos that I posted above in this thread were filmed at work, where the walls are painted in a solid light color.

Hello!
I recorded two (with and without glasses) test videos for Faciallandmarks and Gaze.
FaceNet_GazeNet_video.zip (52.6 MB)

Thanks. They look ok but need more improvement. Suggest to use more training images to finetune the trainable model.
For Facial Landmarks network, from Facial Landmarks Estimation | NVIDIA NGC, Some known limitations include relative increase in keypoint estimation error in extreme head pose (yaw > 60 degree) and occlusions.
You can search more images. For example, https://ibug.doc.ic.ac.uk/resources/300-W/

For Gazenet, you can also search more images, for example,
https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/gaze-based-human-computer-interaction/appearance-based-gaze-estimation-in-the-wild

Hello!
Unfortunately, at the moment I do not have sufficient resources to train the model you proposed.

Is it possible to customize an existing model via configuration files located at: https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/master/configs/facial_tao
?

If I understand you correctly, then the operation of this model and / or the Faciallandmarks detector itself has the following limitations:

  • Yaw. When turning the head left-right/up-down in the sector >60 degrees relative to the optical axis of the camera;
  • Occlusion. If a person whose face is partially in the field of view of the camera and / or is blocked by the face of another person or a foreign object.

?

I am also concerned about the issue of the gaze detector.
Does the operation of the detector imply that when moving the direction of the gaze, a person must turn his head towards the point where he wants to look? As you may have already noticed, gaze detection does not work well when only eye movements occur and the head is motionless.
Incl. if a person is wearing glasses (for example, me).

Thanks in advance.

Yes, but you can only change batch-size, fp16/fp32/int8, threshold, model-engine-file which will not change the network property.

Yes.

It is needed to finetune the model by adding more images for this behavior.

Hello!

Could you specify in more detail which parameters and in which files can be edited to change the behavior of the model, please?

I use an increased threshold value in the file “deepstream_tao_apps/configs/facial_tao
/config_infer_primary_facenet.txt”, which you once advised me:

[class-attrs-all]
#pre-cluster-threshold=0.2
pre-cluster-threshold=0.6

An example for running bs1 and fp32.

Can set batch-size=1 and network-mode=0 and set different name in model-engine-file for below files. You can also set different pre-cluster-threshold(in Facial network, it is threshold)

path/apps/tao_others/deepstream-gaze-app# vim …/…/…/configs/facial_tao/config_infer_primary_facenet.txt

path/apps/tao_others/deepstream-gaze-app# vim …/…/…/configs/facial_tao/faciallandmark_sgie_config.txt

Can set batchSize=1 and networkMode=fp32 and set different name in enginePath for below file.

path/apps/tao_others/deepstream-gaze-app# vim …/…/…/configs/gaze_tao/sample_gazenet_model_config.txt

The command:

path/apps/tao_others/deepstream-gaze-app# ./deepstream-gaze-app 1 …/…/…/configs/facial_tao/sample_faciallandmarks_config.txt file:///home/morganh/temp/gaze_video.mp4 ./gazenet

Hello!
I apologize for the delay in answering.
I ask you not to close the topic.
In the near future I will more or less fully try to answer your recommendations (with adjustment of the model parameters).
p.s. As far as I know, topics on the forum are automatically closed if they have not been active for 2 weeks.