I’m using a pipeline containing a sequence of emotion and gaze detectors (based on an emotion detector application where the gaze detector was added to the pipeline after the emotion detector).
A week ago, a colleague and I decided to test the operation of the detectors when there are 2 people (or more) in front of the camera.
Several problems have been found:
Faciallandmarks do not accurately cover the oval of the face (when the background is of a uniform light color or when there are foreign objects on the background).
Look direction vectors often do not correspond to the look direction. A change in the position of the iris is detected by the detector normally, but the direction and length of the gaze direction vectors live their own life.
In the operation of the emotion detector, there is some “bounce” of the emotions being detected.
In the attached video file, one of my colleagues and I (I am wearing glasses, with a beard and long hair - stress test for detectors) are standing in front of a webcam, the second colleague was recording video from the monitor screen.
video
Time codes:
00:00…00:10 - head turns;
00:12…00:25 - motionless head, eye movements;
00:28…00:48 - turning the head while maintaining the direction of gaze into the camera;
00:49…00:57 - voluntary movements of the head and eyes.
Hello!
Thank you for your reply and for recommending the TAO toolkit. We will definitely use it in the future to adapt the model to our data.
However, now we would like to reproduce the accuracy of face recognition and gaze direction on existing models, for example, on NVIDIA video examples or on our simple examples.
Based on this stage, we will plan further work.
So I would like to know how to tweak the behavior of already trained models to get a level of accuracy comparable to the demo videos. For example, through configuration files located in ‘/deepstream_tao_apps/configs’?
My specific questions:
How can I improve the accuracy of the Faciallandmarks (SGIE) component for delineating the oval of the face? Now we see that the facial landmarks are not aligned with the face oval.
How can I change the behavior of the GazeNet component (SGIE) to accurately register the gaze direction? Now, when a person either turns his head, the determined direction of gaze is erroneously shifted following the turn of the head. And when a person shifts his eyes without turning his head, the determined direction of gaze does not mistakenly change.
In the list of models for FaceNet, I found the deployable_v1.0 model, which has a higher accuracy. Do you recommend trying it to improve the accuracy of FaceNet (PGIE)?
Screencast video
config_infer_primary_facenet.txt
#...
[class-attrs-all]
#pre-cluster-threshold=0.2
pre-cluster-threshold=0.6
group-threshold=1
## Set eps=0.7 and minBoxes for cluster-mode=1(DBSCAN)
#eps=0.2
##minBoxes=3
cluster-mode=1
eps=0.7
minBoxes=3
Start to run the facial landmark application
cd deepstream-faciallandmark-app
./deepstream-faciallandmark-app [1:file sink|2:fakesink|3:display sink] \
<faciallandmark model config file> <input uri> ... <input uri> <out filename>
OR
./deepstream-faciallandmark-app <app YAML config file>
Start to run the Gazenet
cd deepstream-gaze-app
./deepstream-gaze-app [1:file sink|2:fakesink|3:display sink] \
<faciallandmark model config> <input uri> ... <input uri> <out filename>
OR
./deepstream-gaze-app <app YAML config file>
Hello.
Thanks for the answer!
I will make test records separately for each of the detectors: Faciallandmarks and GazeNet in the near future.
Unfortunately, I will make recordings at home, in the background on the wall there is wallpaper with patterns. I hope that they will not interfere with the operation of the Faciallandmarks detector.
The videos that I posted above in this thread were filmed at work, where the walls are painted in a solid light color.
Thanks. They look ok but need more improvement. Suggest to use more training images to finetune the trainable model.
For Facial Landmarks network, from Facial Landmarks Estimation | NVIDIA NGC, Some known limitations include relative increase in keypoint estimation error in extreme head pose (yaw > 60 degree) and occlusions.
You can search more images. For example, https://ibug.doc.ic.ac.uk/resources/300-W/
If I understand you correctly, then the operation of this model and / or the Faciallandmarks detector itself has the following limitations:
Yaw. When turning the head left-right/up-down in the sector >60 degrees relative to the optical axis of the camera;
Occlusion. If a person whose face is partially in the field of view of the camera and / or is blocked by the face of another person or a foreign object.
?
I am also concerned about the issue of the gaze detector.
Does the operation of the detector imply that when moving the direction of the gaze, a person must turn his head towards the point where he wants to look? As you may have already noticed, gaze detection does not work well when only eye movements occur and the head is motionless.
Incl. if a person is wearing glasses (for example, me).
Can set batch-size=1 and network-mode=0 and set different name in model-engine-file for below files. You can also set different pre-cluster-threshold(in Facial network, it is threshold)
path/apps/tao_others/deepstream-gaze-app# vim …/…/…/configs/facial_tao/config_infer_primary_facenet.txt
path/apps/tao_others/deepstream-gaze-app# vim …/…/…/configs/facial_tao/faciallandmark_sgie_config.txt
Can set batchSize=1 and networkMode=fp32 and set different name in enginePath for below file.
path/apps/tao_others/deepstream-gaze-app# vim …/…/…/configs/gaze_tao/sample_gazenet_model_config.txt
Hello!
I apologize for the delay in answering.
I ask you not to close the topic.
In the near future I will more or less fully try to answer your recommendations (with adjustment of the model parameters).
p.s. As far as I know, topics on the forum are automatically closed if they have not been active for 2 weeks.