How i can predict 104 landmark points in the Facial landmarks estimation model?

Hi I’m trying to run the Facial Estimation Landmark model with 104 keypoints
• Hardware Platform: GPU
• DeepStream Version: docker image deepstream:6.0.1-devel
• NVIDIA GPU Driver Version: 470.103.01
• Cuda Version: 11.4

I pulled the docker image stated above and i followed the exact steps stated in this NVIDIA repo (deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub)

I changed the number of landmarks to 104 instead of 80 in the following file:


However, when executing the binary it still displayed that the output is 80 as shown below:

INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [FullDims Engine Info]: layers num: 4
0   INPUT  kFLOAT input_face_images 1x80x80         min: 1x1x80x80       opt: 32x1x80x80      Max: 32x1x80x80
1   OUTPUT kFLOAT conv_keypoints_m80 80x80x80        min: 0               opt: 0               Max: 0
2   OUTPUT kFLOAT softargmax      80x2            min: 0               opt: 0               Max: 0
3   OUTPUT kFLOAT softargmax:1    80              min: 0               opt: 0               Max: 0

The Executed binary:

./deepstream-faciallandmark-app 1 /workspace/deepstream_tao_apps/configs/facial_tao/sample_faciallandmarks_config.txt file:///workspace/deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app/face-2.jpg ./landmarks

I tried to print the coordinates of the landmarks points that had index above 80 they were 0 all the time. Moreover, the output image with the landmarks plotted on it always showed 80 points.

I would like to add that the model i’m using is the latest version of the deployable model downloaded from nvidia’s website.

How i can predict 104 facial landmark points ?

As the description part said “the application can identify 80 landmarks in one human face.”, how do you know it can identify more than 80 landmarks?

In the description in the repo they mentioned the following:

The TAO 3.0 pretrained models used in this sample application:
Facial Landmarks Estimation | NVIDIA NGC

In the facial landmark estimator, in the model overview section, they state that the model can predict 68, 80 or 104 landmarks key points.
Please find the following text taken from nvidia’s FPENet model description:

Images of 80 X 80 X 1

N X 2 keypoint locations. N X 1 keypoint confidence.

N is the number of keypoints. It can have a value of 68, 80, or 104.

I would like to add that in the deepstream tao apps repo the latest version of the FPENet model is used. For this reason i would like to ask you how can we predict 104 landmark points as claimed in the model overview of the FPENet ?

If only set numLandmarks to 104, I can reproduce your issue, we are investigating.

I have tried more pictures , some pictures 's 80-104 positions are not zero, so it is related to the accuracy of the model, please retrain the model based on your own dataset, please refer to this link: Facial Landmarks Estimation | NVIDIA NGC
here are some references Facial Landmark Estimator (FPENet) annotation guidelines

Ok I see thanks a lot for your help!

The second link takes me to " Sorry, this page may have moved, doesn’t exist or is private.", could you double-check?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.