FPENet inference confidence seems low


When running the FPENet deepstream example application on a camera input, the confidence on each landmark seems to be in-between 0.2 and 0.35
When the facial landmarks seem to be very accurate it goes to the 0.3 side, when turning profile and the hidden points are all over the place, it drops to 0.2.

Is this expected or is there something wrong ? I was expecting the values to fluctuate between 0.0 and 1.0

(the orange bar on the left indicates the average confidence. ignore the blue graph)


 L4T 32.7.2 [ JetPack UNKNOWN ]
   Ubuntu 18.04.6 LTS
   Kernel Version: 4.9.253-tegra
 CUDA 10.2.300
   CUDA Architecture: NONE
 OpenCV version: 3.2.0
   OpenCV Cuda: NO
 Vision Works:
 VPI: ii libnvvpi1 1.2.3 arm64 NVIDIA Vision Programming Interface library
 Vulcan: 1.2.70

Relevant Files

Steps To Reproduce

compile and run deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app at release/tao3.0_ds6.0.1 · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub
and print out the retrieved confidence at
deepstream_tao_apps/deepstream_faciallandmark_app.cpp at release/tao3.0_ds6.0.1 · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

and the confidence doesn’t seem to have a very big range ( 0.2-0.35)


This looks like a Deepstream related issue. We will move this post to the Deepstream forum.


1 Like

Is the confidence output of FPENet also dependent on the Primary Model ?

something like:

output PFENet = facenet-confidence * landmark-confidence


If this is the case i should be able to get back to the raw landmark-confidence by dividing by the facenet-confidence ?

Just want to make sure this is what is happening…

The pre-trained model is just for demo purpose. If the accuracy does not satisfy your demands, please re-train the model with your sources. Facial Landmarks Estimation — TAO Toolkit 3.22.05 documentation (nvidia.com)

My question is : if the output of the FPENet from the confidence output layer when running in deepstream as secondary model is influenced (multiplied) by the confidence of the primary model.

In the model card I can see that the output will give:

N X 1 keypoint confidence.

and a pixel accuracy

The region keypoint pixel error is the mean euclidean error in pixel location prediction as compared to the ground truth. We bucketize and average the error per face region (eyes, mouth, chin, etc.). Metric- Region keypoints pixel error

* All keypoints: 6.1
* Eyes region: 3.33
* Mouth region: 2.96

But not what the keypoint confidence range is in the pre-trained model.

Is the output of:

} else if (strcmp(outputLayersInfo[i].layerName,
              "softargmax:1") == 0) {
            confidence = (float *)meta->out_buf_ptrs_host[i];

in pixels ? or in 0.0-1.0 range ?
Just would like to know if the output of the example app is in range of what was seen in the pre-trained model.

Is there anyone in NVidia that knows about this models output ?

Hello ? Is there anyone that can answer my question please ?

@NVES Which forum would have NVidia specialists that know about the particular model output ? FPENet ?

Because even though I have it running in the deepstream environment, the question is more related to the model output itself.

I’ve copied this question to the TAO toolkit forum… maybe someone there can help.