TAO Facerecognition sample

foreverneilyoung · August 15, 2024, 10:07am

DS7
dGPU

I’m having a question regarding this sample: deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

I followed the instructions to build the app and uploaded an faciallandmarks_test.jpg image:

Then I ran the app completely unchanged.

~/deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app$ ./deepstream-faciallandmark-app 1 ../../../configs/nvinfer/facial_tao/sample_faciallandmarks_config.txt file:///home/ubuntu/trump-mugshot.jpg ./lan
dmarks
Request sink_0 pad from streammux
Now playing: file:///home/ubuntu/trump-mugshot.jpg
0:00:07.460973354 215028 0x638dc60b32f0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<second-infer-engine> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2095> [UID = 2]: deserialized trt engine from :/home/ubuntu/deepstream_tao_apps/models/faciallandmark/faciallandmark.etlt_b32_gpu0_int8.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:612 [FullDims Engine Info]: layers num: 4
0   INPUT  kFLOAT input_face_images 1x80x80         min: 1x1x80x80       opt: 32x1x80x80      Max: 32x1x80x80      
1   OUTPUT kFLOAT conv_keypoints_m80 80x80x80        min: 0               opt: 0               Max: 0               
2   OUTPUT kFLOAT softargmax      80x2            min: 0               opt: 0               Max: 0               
3   OUTPUT kFLOAT softargmax:1    80              min: 0               opt: 0               Max: 0               

0:00:07.589611801 215028 0x638dc60b32f0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<second-infer-engine> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2198> [UID = 2]: Use deserialized engine model: /home/ubuntu/deepstream_tao_apps/models/faciallandmark/faciallandmark.etlt_b32_gpu0_int8.engine
0:00:07.774326740 215028 0x638dc60b32f0 INFO                 nvinfer gstnvinfer_impl.cpp:343:notifyLoadModelStatus:<second-infer-engine> [UID 2]: Load new model:../../../configs/nvinfer/facial_tao/faciallandmark_sgie_config.txt sucessfully
0:00:07.776232147 215028 0x638dc60b32f0 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1244> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
0:00:14.558549788 215028 0x638dc60b32f0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2095> [UID = 1]: deserialized trt engine from :/home/ubuntu/deepstream_tao_apps/models/facenet/facenet.etlt_b1_gpu0_int8.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:612 [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT input_1         3x416x736       
1   OUTPUT kFLOAT output_bbox/BiasAdd 4x26x46         
2   OUTPUT kFLOAT output_cov/Sigmoid 1x26x46         

0:00:14.692110526 215028 0x638dc60b32f0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2198> [UID = 1]: Use deserialized engine model: /home/ubuntu/deepstream_tao_apps/models/facenet/facenet.etlt_b1_gpu0_int8.engine
0:00:14.694513544 215028 0x638dc60b32f0 INFO                 nvinfer gstnvinfer_impl.cpp:343:notifyLoadModelStatus:<primary-infer-engine1> [UID 1]: Load new model:../../../configs/nvinfer/facial_tao/config_infer_primary_facenet.txt sucessfully
Decodebin child added: source
Decodebin child added: decodebin0
Running...
Decodebin child added: nvjpegdec0
Using GPU 0 (Tesla T4, 40 SMs, 1024 th/SM max, CC 7.5, ECC on)
In cb_newpad
###Decodebin pick nvidia decoder plugin.
nvstreammux: Successfully handled EOS for source_id=0
Frame Number = 0 Face Count = 3
End of stream
Returned, stopping playback
Average fps 0.000233
Totally 3 faces are inferred
Deleting pipeline

The output landmarks.jpg gives … hmmm … something (just what?):

First question: Why 3 faces, not just one? What parameters need to be altered in order to get better results?

Then I think I understand what facemarks are generally, this is explained here: Facial Landmarks Estimation | NVIDIA NGC

But what are that yellow boxes?

Generally I find it hard to read, if you are not a CPP expert and for sure hard to port to other languages and use cases, especially this magic here: deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app/deepstream_faciallandmark_app.cpp at 344d6dc10839aec755dc8cf8e2f97626aa73d3ed · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

Is there more information about the general work of this?

Fiona.Chen · August 16, 2024, 3:26am

The pretrained FaceDetect | NVIDIA NGC model is just a sample and prototype, if you want the model to be more precise, please re-train the model with TAO toolkit.
You may try to set a higher value of “pre-cluster-threshold” in deepstream_tao_apps/configs/nvinfer/facial_tao/config_infer_primary_facenet.txt at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com) to filter out low probability values.

The yellow bboxes are eyes. deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app/deepstream_faciallandmark_meta.cpp at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)

It is hard to read because the code is model specified. The code is just to explain and calculate the model’s outputs.
You’ve got the faciallandmarks model output information like:

There are two output layers named as “softargmax” and “softargmax:1”, the output dimensions are 80x2 and 80. The same information is stored in “meta->output_layers_info”. The code reads the output layers’ information and parses the output data and then calculate the final label index from the softargmax data.

From the above questions, your confusions are all model related. They are not general information but model customization information. The more you understand the model, the more you know about the code.

foreverneilyoung · August 16, 2024, 3:52am

Thanks for the comprehensive answer for now. I‘m trying to learn. Currently checking out dlib, which is a bit more transparent w.r.t. input and output. Maybe this will allow me to understand this sample than

system · September 10, 2024, 3:04am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TAO face recognition problem DeepStream SDK	11	172	August 26, 2024
FaceDetect Pre-Trained model implementation using DS DeepStream SDK	26	1228	July 30, 2023
What model to use for face recognition? DeepStream SDK	23	4897	June 19, 2022
Facial Landmarks in Python DeepStream SDK deepstream	11	204	December 3, 2024
Face landmarks model not working on deepstream-test2 app DeepStream SDK jetson-inference , python , jetson , deepstream	7	196	August 2, 2024
Landmarks spread over face in TAO facial landmark sample app DeepStream SDK jetson-inference , gstreamer , samples , tao	6	103	August 6, 2024
Deepstream SDK facial landmarks not working with multiple faces DeepStream SDK deepstream	4	111	September 23, 2024
Face detection with deepstream with landmarks DeepStream SDK	17	4249	October 12, 2021
How i can predict 104 landmark points in the Facial landmarks estimation model? DeepStream SDK	7	787	June 3, 2022
Object detection pre-trained model inference issue in deepstream DeepStream SDK tensorrt , jetson-inference , gstreamer , python	51	933	August 9, 2024

TAO Facerecognition sample

Related topics