TAO Facerecognition sample

DS7
dGPU

I’m having a question regarding this sample: deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

I followed the instructions to build the app and uploaded an faciallandmarks_test.jpg image:

Then I ran the app completely unchanged.

~/deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app$ ./deepstream-faciallandmark-app 1 ../../../configs/nvinfer/facial_tao/sample_faciallandmarks_config.txt file:///home/ubuntu/trump-mugshot.jpg ./lan
dmarks
Request sink_0 pad from streammux
Now playing: file:///home/ubuntu/trump-mugshot.jpg
0:00:07.460973354 215028 0x638dc60b32f0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<second-infer-engine> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2095> [UID = 2]: deserialized trt engine from :/home/ubuntu/deepstream_tao_apps/models/faciallandmark/faciallandmark.etlt_b32_gpu0_int8.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:612 [FullDims Engine Info]: layers num: 4
0   INPUT  kFLOAT input_face_images 1x80x80         min: 1x1x80x80       opt: 32x1x80x80      Max: 32x1x80x80      
1   OUTPUT kFLOAT conv_keypoints_m80 80x80x80        min: 0               opt: 0               Max: 0               
2   OUTPUT kFLOAT softargmax      80x2            min: 0               opt: 0               Max: 0               
3   OUTPUT kFLOAT softargmax:1    80              min: 0               opt: 0               Max: 0               

0:00:07.589611801 215028 0x638dc60b32f0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<second-infer-engine> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2198> [UID = 2]: Use deserialized engine model: /home/ubuntu/deepstream_tao_apps/models/faciallandmark/faciallandmark.etlt_b32_gpu0_int8.engine
0:00:07.774326740 215028 0x638dc60b32f0 INFO                 nvinfer gstnvinfer_impl.cpp:343:notifyLoadModelStatus:<second-infer-engine> [UID 2]: Load new model:../../../configs/nvinfer/facial_tao/faciallandmark_sgie_config.txt sucessfully
0:00:07.776232147 215028 0x638dc60b32f0 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1244> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
0:00:14.558549788 215028 0x638dc60b32f0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2095> [UID = 1]: deserialized trt engine from :/home/ubuntu/deepstream_tao_apps/models/facenet/facenet.etlt_b1_gpu0_int8.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:612 [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT input_1         3x416x736       
1   OUTPUT kFLOAT output_bbox/BiasAdd 4x26x46         
2   OUTPUT kFLOAT output_cov/Sigmoid 1x26x46         

0:00:14.692110526 215028 0x638dc60b32f0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2198> [UID = 1]: Use deserialized engine model: /home/ubuntu/deepstream_tao_apps/models/facenet/facenet.etlt_b1_gpu0_int8.engine
0:00:14.694513544 215028 0x638dc60b32f0 INFO                 nvinfer gstnvinfer_impl.cpp:343:notifyLoadModelStatus:<primary-infer-engine1> [UID 1]: Load new model:../../../configs/nvinfer/facial_tao/config_infer_primary_facenet.txt sucessfully
Decodebin child added: source
Decodebin child added: decodebin0
Running...
Decodebin child added: nvjpegdec0
Using GPU 0 (Tesla T4, 40 SMs, 1024 th/SM max, CC 7.5, ECC on)
In cb_newpad
###Decodebin pick nvidia decoder plugin.
nvstreammux: Successfully handled EOS for source_id=0
Frame Number = 0 Face Count = 3
End of stream
Returned, stopping playback
Average fps 0.000233
Totally 3 faces are inferred
Deleting pipeline

The output landmarks.jpg gives … hmmm … something (just what?):

First question: Why 3 faces, not just one? What parameters need to be altered in order to get better results?

Then I think I understand what facemarks are generally, this is explained here: Facial Landmarks Estimation | NVIDIA NGC

But what are that yellow boxes?

Generally I find it hard to read, if you are not a CPP expert and for sure hard to port to other languages and use cases, especially this magic here: deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app/deepstream_faciallandmark_app.cpp at 344d6dc10839aec755dc8cf8e2f97626aa73d3ed · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

Is there more information about the general work of this?

The pretrained FaceDetect | NVIDIA NGC model is just a sample and prototype, if you want the model to be more precise, please re-train the model with TAO toolkit.
You may try to set a higher value of “pre-cluster-threshold” in deepstream_tao_apps/configs/nvinfer/facial_tao/config_infer_primary_facenet.txt at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com) to filter out low probability values.

The yellow bboxes are eyes. deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app/deepstream_faciallandmark_meta.cpp at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)

It is hard to read because the code is model specified. The code is just to explain and calculate the model’s outputs.
You’ve got the faciallandmarks model output information like:

There are two output layers named as “softargmax” and “softargmax:1”, the output dimensions are 80x2 and 80. The same information is stored in “meta->output_layers_info”. The code reads the output layers’ information and parses the output data and then calculate the final label index from the softargmax data.

From the above questions, your confusions are all model related. They are not general information but model customization information. The more you understand the model, the more you know about the code.

Thanks for the comprehensive answer for now. I‘m trying to learn. Currently checking out dlib, which is a bit more transparent w.r.t. input and output. Maybe this will allow me to understand this sample than

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.