Background
The off the shelf FPEnet model gives poor results when the face is tilted to the left or right/ low lighting / sun glare etc.
(Facial Landmarks Estimation | NVIDIA NGC)
So we decided to fine tune the Fpenet model using only 16 points on our custom dataset.
We have run training using the Fpenet ipython notebook
https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html
16 Point Labelling
-
1-6 Points of eye on the left in the image
-
7 - 12 Points on the eye on the right in the image
-
13 Nose Point
-
14 Mouth corner on the left in the image
-
15 Mouth corner on the right in the image
-
16 Chin point
On images where all the points are not visible, we have marked them as occluded based on this format
https://docs.nvidia.com/tao/tao-toolkit/text/data_annotation_format.html#json-label-data-format
Face Bounding Box Labelling
Single rectangle face bounding box is labelled and added as ground truth to the labelling job.
I have set the outer and tighter bounding box values the same.
Label Json File
afw.json (643.9 KB)
Example where some points are occluded:
"filename": "/workspace/tao-experiments/fpenet/afw/smartdvr-1424221015803-usb-Generic_Camera-RGB_200901010001-video-index0_20220928060534-20220928060625-45.png",
"class": "image",
"annotations": [
{
"class": "FaceBbox",
"tool-version": "1.0",
"Occlusion": 0,
"face_outer_bboxx": 548.0,
"face_outer_bboxy": 50.0,
"face_outer_bboxwidth": 251.0,
"face_outer_bboxheight": 425.0,
"face_tight_bboxx": 548.0,
"face_tight_bboxy": 50.0,
"face_tight_bboxwidth": 251.0,
"face_tight_bboxheight": 425.0
},
{
"tool-version": "1.0",
"version": "v1",
"class": "FiducialPoints",
"P1x": 0.0,
"P1y": 0.0,
"P1occluded": true,
"P2x": 0.0,
"P2y": 0.0,
"P2occluded": true,
"P3x": 0.0,
"P3y": 0.0,
"P3occluded": true,
"P4x": 0.0,
"P4y": 0.0,
"P4occluded": true,
"P5x": 0.0,
"P5y": 0.0,
"P5occluded": true,
"P6x": 0.0,
"P6y": 0.0,
"P6occluded": true,
"P7x": 0.0,
"P7y": 0.0,
"P7occluded": true,
"P8x": 0.0,
"P8y": 0.0,
"P8occluded": true,
"P9x": 0.0,
"P9y": 0.0,
"P9occluded": true,
"P10x": 0.0,
"P10y": 0.0,
"P10occluded": true,
"P11x": 0.0,
"P11y": 0.0,
"P11occluded": true,
"P12x": 0.0,
"P12y": 0.0,
"P12occluded": true,
"P13x": 792.0,
"P13y": 277.0,
"P14x": 723.0,
"P14y": 360.0,
"P15x": 0.0,
"P15y": 0.0,
"P15occluded": true,
"P16x": 696.0,
"P16y": 449.0
}
]
}
Training spec file:
experiment_spec_16.yaml (2.2 KB)
Dataset Size: 338 images
Results:
The inference results even on the training set images (especially when points are occluded) is completely wrong, So I am not sure whether the experiment config is correct and the training is actually using the images with occlusions.
Occluded Image Results
Other Results
Questions:
-
Is the face bounding box set in the json actually being used or is it recalculated ?
When I look at the tensorboard image examples, the images are cropped differently and doesn’t seem to be using the bounding box provided. Is the bounding box recalculated based on the points? How does that work when you only have partial points are labelled eg. just the eye points ?
-
What is the preprocessing logic applied in the dataset_convert step?
I see that around 50 images are dropped when the tfrecord file is generated, but could not find any documentation explaining the discrepancy. Keen to understand what images are removed and on what criteria.
-
How are the images with some occluded points handled? Are they used in training?
It feels like the images that have some occluded points are not being used in training.
I have tried set the points as the following: No coordinates only set occluded: Eg. “P9occluded”: true
Coordinates with fixed point and occluded flag Eg. “P9x”: 45, “P9y”: 45, “P9occluded”: true -
How to get confidence score when running tao inference for the points in the notebook
I want to check the confidence score for the output points. How can I get the confidence score in the tao inference command. I could not find any relevant flag in the help docs.
Can someone please help answer the above questions, as we want to ensure we have the right setup before labelling more data and retraining ?
Thanks