Facenet question

foreverneilyoung · August 26, 2024, 5:20pm

Please provide the following information when requesting support.

Hardware: T4
Network type: Facenet, FPEnet

I’m having ported parts of the TAO facial landmarks sample app to GOLANG deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

I’m not interested in annotating video with facial landmarks. I’m interested in “fingerprinting” faces in order to be able to recognize them again

My input is an RTSP video 30 fps HD
A DeepStream 7 pipeline consisting of the primary detector running Facenet and a secondary detector running FPENet
Detected faces are square aligned (as the CPP app does it), so that W/H of the crop is identical, before it goes into the secondary model
Since resolution varies, I’m normalizing the landmark coordinates to the current resolution
In a training process I’m using 10 seconds of video from non moving faces (to not have to deal with changing landmark positions) over time
This finally gives me a “fingerprint” for a person, consisting of the 80 X/Y float landmarks, averaged from the taken images in time, considering the confidence
This can be repeated with different poses to have more fingerprints per person
Results are stored into a database

Now it comes:

In the recognition process I’m calculating the euclidian distance of landmark points (separately for chin, eyebrows, eyes, etc) and average this finally to a “distance” value between a stored fingerprint and the current test fingerprint (the current landmark tensor)
If the distance for a given database entry is below a certain threshold, I consider this as “recognised person”.
This unfortunately is giving ambiguous results (means: I hold my nose into the video and the other person is detected).

I have gathered some experiences with DLIB and I thought I have learned they would also use the geometric distance for facial landmarks, but honestly I’m not sure to be on the right track here. Especially if I’m - except that I’m making sure the face image to have the same width/ height - not doing any attempts to “morph” or “flatten” a face image, in case it is not exactly a frontal face shot.

Is there any information, how to use the facial landmarks as the come out of FPENet for Recognition again?

Makes sense or is there anything else to do?

Morganh · August 27, 2024, 2:49am

You can use ReID network. See more info in ReIdentificationNet - NVIDIA Docs or ReIdentificationNet Transformer - NVIDIA Docs.

foreverneilyoung · August 27, 2024, 3:35am

Thanks. Is there sample code?

Morganh · August 27, 2024, 3:37am

Yes, you can take a look at ReIdentificationNet - NVIDIA Docs.

The TAO Triton Apps provide an inference sample for ReIdentificationNet. It consumes a TensorRT engine and supports running with a directory of query (probe) images and a directory of test (gallery) images containing the same identities.

foreverneilyoung · August 27, 2024, 4:03am

Hmm. Seems to be all Triton server stuff. I don’t find DeepStream related code

foreverneilyoung · August 27, 2024, 4:20am

…even though there DeepStream is mentioned. Well, again through the NVIDIA revolving door and back? No thanks.

foreverneilyoung · August 27, 2024, 4:26am

Maybe this deepstream_tao_apps/apps/tao_others/deepstream-mdx-perception-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

Morganh · August 27, 2024, 5:02am

Yes, you can take a look at this.

foreverneilyoung · August 27, 2024, 6:07am

OK, using this model now as SGIE after a PGIE running Facenet.

Added a probe to the src pad of the SGIE and got presumably tensor data:

The model says, it gives an output layer “fc_pred” with 256 floats (?):

I carved that out and got this, for example (layer name in the info did match):

embeddings: [-0.36572266 0.5307617 -0.8671875 -0.7807617 -0.60546875 1.3427734 -1.6210938 0.90771484 -0.17700195 -0.93359375 0.94873047 -0.9091797 0.20812988 -0.056762695 -0.56103516 -0.7973633 0.18334961 -2.2851562 0.22570801 1.0263672 -0.3100586 0.1784668 -0.33862305 -0.93310547 0.16625977 -2.3027344 -0.7553711 1.1425781 0.8222656 0.7519531 -1.1054688 0.60253906 -0.4658203 0.9379883 2.3613281 -1.0976562 0.99560547 1.4501953 -0.8442383 0.7216797 -2.1464844 -0.21130371 -0.17150879 2.6386719 -0.080322266 -0.53271484 0.5800781 1.0634766 -2.015625 -1.8398438 0.87841797 -1.6191406 0.9970703 2.7519531 0.55322266 -1.3710938 0.93359375 -0.10864258 -0.5961914 -1.7578125 -1.2734375 2.0859375 -2.1621094 0.5625 1.5986328 -0.8388672 0.22949219 -2.015625 -1.5136719 0.09442139 -1.4746094 -1.0693359 -0.609375 -2.9589844 -2.7480469 -0.89501953 -1.2939453 -1.2285156 -1.0126953 -1.2128906 2.8945312 -1.6474609 2.5097656 -1.7705078 -0.6010742 -1.5771484 0.3972168 -0.07128906 -0.5883789 -1.1298828 0.31323242 0.21472168 1.0722656 -0.1685791 1.8994141 1.0253906 0.42578125 1.4296875 0.7939453 0.6040039 -0.6123047 -1.3125 -1.7753906 1.1240234 -3.1308594 3.2558594 -0.6767578 -0.94677734 -3.8027344 0.68408203 0.34375 -0.68115234 0.3852539 -1.2880859 -2.984375 -0.07647705 -1.90625 0.9506836 0.3408203 -0.8779297 2.0820312 -1.7373047 -1.5419922 -0.099487305 -0.123168945 0.61816406 1.0009766 -0.34692383 0.28393555 1.7802734 -0.94433594 0.39624023 1.5283203 -0.7138672 -1.2841797 0.9194336 -1.703125 -0.26489258 -1.2050781 -0.2319336 0.74902344 -0.1850586 -0.7446289 0.12310791 1.1494141 0.62060547 -1.0830078 -0.16320801 0.8305664 0.7524414 2.2558594 -1.1767578 1.4414062 -0.9663086 0.09509277 2.4140625 2.78125 0.7211914 2.5234375 -0.4404297 2.8535156 -1.6152344 0.49414062 -0.8432617 -2.0078125 -0.9301758 -0.23168945 1.9335938 1.0498047 -0.6699219 0.06359863 0.5620117 -0.5732422 0.8129883 -0.3359375 -2.8476562 -0.69628906 0.19274902 1.8759766 0.75341797 -0.4506836 0.57373047 0.08453369 -1.0058594 -0.03451538 0.1505127 2.2089844 -0.3671875 1.6787109 0.33666992 -1.6699219 -1.2402344 -1.0644531 -0.4375 0.10021973 0.3569336 -1.0996094 1.0175781 -0.828125 -0.17504883 -0.2668457 -0.35742188 -0.09515381 -1.5126953 -1.4082031 -0.90625 0.5800781 0.70410156 0.18493652 0.60595703 -1.0371094 -0.9711914 1.015625 1.9238281 1.7060547 0.3581543 0.068847656 0.67333984 -1.0283203 0.15551758 -0.47509766 -1.2880859 1.1484375 -1.9267578 0.32617188 2.7910156 0.049438477 -0.1619873 -2.0996094 -0.91308594 0.21789551 0.39624023 -1.8847656 -0.7080078 0.87158203 -0.51464844 1.5537109 -2.2988281 -0.109436035 -2.7929688 -0.28149414 -0.2919922 1.4033203 -1.7675781 -0.04232788 0.54833984 0.9633789 -0.025436401 -1.2763672 0.6015625 -0.5151367 2.7109375 -1.3115234 -0.22998047 -1.1132812 0.7006836]

Question: Is this now something I could save in a training phase and use later on for recognition by calculating the euclidian distance to a test vector?

Sorry if this is a stupid question.

Morganh · August 27, 2024, 6:28am

The output layer “fc_pred” is a tensor of float32[batch,embedding_size] .
You can download a model from ReIdentificationNet | NVIDIA NGC.

foreverneilyoung · August 27, 2024, 6:36am

Sorry, how to understand this? batch times embedding_size? I get exactly 256 floats. If I try to obtain more, the rest is filled with 0

Morganh · August 27, 2024, 6:49am

Yes, for this ngc onnx model, the output size of the feature embeddings is 256.

foreverneilyoung · August 27, 2024, 6:51am

Thanks for the confirmation. I guess I’m on the right track trying to calculate euclidian distances between these embeddings

foreverneilyoung · August 27, 2024, 7:05am

Is my understanding right: This embedding is basically just a fingerprint of the face, right? No other information contained.

Morganh · August 27, 2024, 7:18am

This onnx model generates embeddings for identifying people captured in different scenes. Not for face. For more info, you can refer to

H. Luo, Y. Gu, X. Liao, S. Lai and W. Jiang, “Bag of Tricks and a Strong Baseline for Deep Person Re-Identification,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019, 1 pp. 1487-1495, doi: 10.1109/CVPRW.2019.00190.
L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang and Q. Tian, “Scalable Person Re-identification: A Benchmark,” 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1116-1124, doi: 10.1109/ICCV.2015.133.
M. Naphade, S. Wang, D. C. Anastasiu, Z. Tang, M.-C. Chang, Y. Yao, L. Zheng, M. S. Rahman, M. S. Arya, A. Sharma, Q. Feng, V. Ablavsky, S. Sclaroff, P. Chakraborty, S. Prajapati, A. Li, S. Li, K. Kunadharaju, S. Jiang and R. Chellappa, “The 7th AI City Challenge,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023.

foreverneilyoung · August 27, 2024, 7:21am

This onnx model generates embeddings for identifying people captured in different scenes. Not for face. For more info, you can refer to

Well, you recommended it. I didn’t speak about person identification in the first post.

Morganh · August 27, 2024, 7:23am

For face, you can train a new ReID model with some face dataset.

foreverneilyoung · August 27, 2024, 7:27am

There is always the problem with the assets… I don’t have face datasets, and what you can get for free is bullshit and the entire TAO training is a nightmarish experience, IMHO. No, I’d rather go with a combination of Facenet and DLIB, this is not that performant, but pretty reliable.

Morganh · August 27, 2024, 7:31am

The ngc model cannot cover all the scenarios. And TAO is designed to end user to finetune their own dataset with the ngc pretrained model.

foreverneilyoung · September 7, 2024, 10:25am

I’m wondering how one could create some reference vectors (“mug shots”) of persons in order to use this later on for distance comparisons

I mean, it might not necessarily be a DeepStream question, more a GStreamer question: How to have a pipeline fed from still images instead of a video stream? In order to run a pipeline with JPEG input → primary → secondary inference. Somehow something known about this?

Another option would be to feed the pipeline with a say 10 second video and making N snapshots of the fc_pred vector in order to assign this to a person for later comparisons…

Topic		Replies	Views
What model to use for face recognition? DeepStream SDK	23	4618	June 19, 2022
Facial Recognition Query DeepStream SDK tao	3	294	April 26, 2024
Face Embeddingd for FaceNet Face Recognition DeepStream app DeepStream SDK	20	4281	October 12, 2021
TAO Modal for face embeeding generation TAO Toolkit tao	2	400	September 15, 2023
TAO face recognition problem DeepStream SDK	11	86	August 26, 2024
Issues with Face Recognition DeepStream SDK deepstream	19	215	April 29, 2025
Regarding dbouts output FPENet algorithm DeepStream SDK jetson-inference , gstreamer	5	41	August 20, 2024
Face Embedding Generation Model Accuracy DeepStream SDK	4	278	November 21, 2023
Deepstream for face recognition DeepStream SDK	17	4603	October 12, 2021
Running re_identification_net in deepstream DeepStream SDK	29	522	June 25, 2024

Facenet question

Related topics