Label information required for Gazenet training

Rei · August 6, 2021, 5:36am

• Network Type (gazenet)
• TLT Version (v3)

Hello,
I am learning TLT for GazeNet. I want to convert my own dataset(json format) to tfrecords format.

Transfer Learning Toolkit V3.0
tlt_cv_samples_v1.1.0/gazenet/gazenet.ipynb
3. Generate tfrecords from labels in json format

I have a hard time annotation, so please tell me the minimum parameters required for this model. The guide below stated that FaceBox and FiducialPoint are required.

However, FiducialPoint has many parameters, so it is difficult to annotate/labeling the data.

Question 1:
What label information do I need to retrain this model?

Question 2:
In the notebook procedure, the MPII dataset is converted to the following by the python program.
・ Data
・ Labels (json)
・ Config

Prepare dataset and pre-trained model
B. Convert datasets and labels to required format

Could I use this config file for my own dataset?
I can’t understand the parameters in the config files.

Morganh · August 6, 2021, 6:03pm

Please see https://ngc.nvidia.com/catalog/models/nvidia:tlt_gazenet

The training dataset is created by labeling ground-truth bounding-boxes and landmarks by human labelers. The face bounding box and fiducial landmarks are used to prepare inputs (face crop image, left eye crop image, right eye crop image, and facegrid) to the gaze model. For Face bounding boxes labeling, please refer to the FaceNet model card. For Facial landmarks labeling, please refer to the FPENet model card.

==> https://ngc.nvidia.com/catalog/models/nvidia:tlt_fpenet

Training Data Ground-truth Labeling Guidelines

The ground truth dataset is created by labeling ground-truth facial keypoints by human labellers.

If you are looking to re-train with your own dataset, please follow the guideline below.

Label the keypoints in the correct order as accuractely as possible. The human labeler would be able to zoom in to a face region to correctly localize the keypoint.

For keypoints that are not easily distinguishable such as chin or nose, the best estimate should be made by the human labeler. Some keypoints are easily distinguishable such as mouth corners or eye corners.

Label a keypoint as “occluded” if the keypoint is not visible due to an external object or due to extreme head pose angles. A keypoint is considered occluded when the keypoint is in the image but not visible.

To reduce discrepency in labeling between multiple human labelers, the same keypoint ordering and instructions should be used across labelers. An independent human labeler may be used to test the quality of the annotated landmarks and potential corrections.

Face bounding boxes labeling:

Face bounding boxes should be as tight as possible.

Label each face bounding box with an occlusion level ranging from 0 to 9. 0 means the face is fully visible and 9 means the face is 90% or more occluded. For training, only faces with occlusion level 0-5 are considered.

The datasets consist of webcam images so truncation is rarely seen. If faces are at the edge of the frame with visibility less than 60% due to truncation, this image is dropped from the dataset.

The Sloth and Label-Studio tools have been utilized for labeling.

==> https://ngc.nvidia.com/catalog/models/nvidia:tlt_facenet

Training Data Ground-truth Labeling Guidelines

The training dataset is created by labeling ground-truth bounding-boxes and categories by human labellers. Following guidelines were used while labelling the training data for NVIDIA FaceNet model.

FaceNet project labelling guidelines

Face bounding boxes should be as tight as possible.

Label each face bounding box with an occlusion level ranging from 0 to 9. 0 means the face is fully visible and 9 means the face is 90% or more occluded. For training, only faces with occlusion level 0-5 are considered.

If faces are at the edge of the frame with visibility less than 60% due to truncation, this image is dropped from the dataset.

For the parameters in the config files, you can refer to tlt_cv_samples_v1.1.0/gazenet/utils_gazeviz.py

Topic		Replies	Views
Facial_KeyPoint training using TLT TAO Toolkit	8	675	December 15, 2021
GazeNet label data not found TAO Toolkit	3	513	May 26, 2021
How are occluded points, face bounding boxes and tfrecord generation handled in Fpenet custom training? Very poor custom retraining results TAO Toolkit	4	564	December 1, 2022
Facial Landmark Estimator (FPENet) annotation guidelines TAO Toolkit	2	1125	May 17, 2022
TFRecord creation process TAO Toolkit	5	926	October 11, 2019
How to generate inference_sample.json file and the bbox annotations for fpenet? TAO Toolkit tao	5	578	May 5, 2023
How to train a customising OCDNET model in Tao TAO Toolkit tao	7	495	December 20, 2023
TFrecord that created in tensorflow object detection API TAO Toolkit	1	790	April 15, 2019
How to get all the parameter of labels file using labelImg for Training on TLT TAO Toolkit	3	550	April 18, 2020
Training Detectnet_v2 on darknet annotation format dataset TAO Toolkit ai-training	2	782	June 25, 2020

Label information required for Gazenet training

Training Data Ground-truth Labeling Guidelines

Training Data Ground-truth Labeling Guidelines

Related topics