DS7
dGPU
I’m having a question regarding this sample: deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub
I followed the instructions to build the app and uploaded an faciallandmarks_test.jpg
image:
Then I ran the app completely unchanged.
~/deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app$ ./deepstream-faciallandmark-app 1 ../../../configs/nvinfer/facial_tao/sample_faciallandmarks_config.txt file:///home/ubuntu/trump-mugshot.jpg ./lan
dmarks
Request sink_0 pad from streammux
Now playing: file:///home/ubuntu/trump-mugshot.jpg
0:00:07.460973354 215028 0x638dc60b32f0 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<second-infer-engine> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2095> [UID = 2]: deserialized trt engine from :/home/ubuntu/deepstream_tao_apps/models/faciallandmark/faciallandmark.etlt_b32_gpu0_int8.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:612 [FullDims Engine Info]: layers num: 4
0 INPUT kFLOAT input_face_images 1x80x80 min: 1x1x80x80 opt: 32x1x80x80 Max: 32x1x80x80
1 OUTPUT kFLOAT conv_keypoints_m80 80x80x80 min: 0 opt: 0 Max: 0
2 OUTPUT kFLOAT softargmax 80x2 min: 0 opt: 0 Max: 0
3 OUTPUT kFLOAT softargmax:1 80 min: 0 opt: 0 Max: 0
0:00:07.589611801 215028 0x638dc60b32f0 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<second-infer-engine> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2198> [UID = 2]: Use deserialized engine model: /home/ubuntu/deepstream_tao_apps/models/faciallandmark/faciallandmark.etlt_b32_gpu0_int8.engine
0:00:07.774326740 215028 0x638dc60b32f0 INFO nvinfer gstnvinfer_impl.cpp:343:notifyLoadModelStatus:<second-infer-engine> [UID 2]: Load new model:../../../configs/nvinfer/facial_tao/faciallandmark_sgie_config.txt sucessfully
0:00:07.776232147 215028 0x638dc60b32f0 WARN nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1244> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
0:00:14.558549788 215028 0x638dc60b32f0 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2095> [UID = 1]: deserialized trt engine from :/home/ubuntu/deepstream_tao_apps/models/facenet/facenet.etlt_b1_gpu0_int8.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:612 [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_1 3x416x736
1 OUTPUT kFLOAT output_bbox/BiasAdd 4x26x46
2 OUTPUT kFLOAT output_cov/Sigmoid 1x26x46
0:00:14.692110526 215028 0x638dc60b32f0 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2198> [UID = 1]: Use deserialized engine model: /home/ubuntu/deepstream_tao_apps/models/facenet/facenet.etlt_b1_gpu0_int8.engine
0:00:14.694513544 215028 0x638dc60b32f0 INFO nvinfer gstnvinfer_impl.cpp:343:notifyLoadModelStatus:<primary-infer-engine1> [UID 1]: Load new model:../../../configs/nvinfer/facial_tao/config_infer_primary_facenet.txt sucessfully
Decodebin child added: source
Decodebin child added: decodebin0
Running...
Decodebin child added: nvjpegdec0
Using GPU 0 (Tesla T4, 40 SMs, 1024 th/SM max, CC 7.5, ECC on)
In cb_newpad
###Decodebin pick nvidia decoder plugin.
nvstreammux: Successfully handled EOS for source_id=0
Frame Number = 0 Face Count = 3
End of stream
Returned, stopping playback
Average fps 0.000233
Totally 3 faces are inferred
Deleting pipeline
The output landmarks.jpg
gives … hmmm … something (just what?):
First question: Why 3 faces, not just one? What parameters need to be altered in order to get better results?
Then I think I understand what facemarks are generally, this is explained here: Facial Landmarks Estimation | NVIDIA NGC
But what are that yellow boxes?
Generally I find it hard to read, if you are not a CPP expert and for sure hard to port to other languages and use cases, especially this magic here: deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app/deepstream_faciallandmark_app.cpp at 344d6dc10839aec755dc8cf8e2f97626aa73d3ed · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub
Is there more information about the general work of this?