Dear all,
I am trying run the inference of gaze estimation model with DeepStream, the code from deepstream_tao_apps/apps/tao_others/deepstream-gaze-app at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)
But when I test with one image it ran very slowly, so that it could hardly run on a video.
Then I test facial landmarks model first, according to deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)
It seemed to go well with this model, but the inference performance can reach only about 2-6 FPS in mode FP16. This is far from the performance tested by the official.
Is there anything wrong with my configuration? The terminal output and system environment see below:
wen@wen-desktop:~/Masterarbeit/deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app$ ./deepstream-faciallandmark-app 1 ../../../configs/facial_tao/sample_faciallandmarks_config.txt file:///home/wen/Masterarbeit/face/video_test.mp4 ./landmarks_rot
Request sink_0 pad from streammux
####+++OUT file ./landmarks_rot.264
Now playing: file:///home/wen/Masterarbeit/face/video_test.mp4
Opening in BLOCKING MODE
0:00:08.718553375 9785 0x7f38002390 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<second-infer-engine1> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 2]: deserialized trt engine from :/home/wen/Masterarbeit/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_fp16.engine
INFO: [FullDims Engine Info]: layers num: 4
0 INPUT kFLOAT input_face_images 1x80x80 min: 1x1x80x80 opt: 32x1x80x80 Max: 32x1x80x80
1 OUTPUT kFLOAT conv_keypoints_m80 80x80x80 min: 0 opt: 0 Max: 0
2 OUTPUT kFLOAT softargmax 80x2 min: 0 opt: 0 Max: 0
3 OUTPUT kFLOAT softargmax:1 80 min: 0 opt: 0 Max: 0
ERROR: [TRT]: 3: Cannot find binding of given name: softargmax,softargmax:1,conv_keypoints_m80
0:00:08.744800424 9785 0x7f38002390 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<second-infer-engine1> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1868> [UID = 2]: Could not find output layer 'softargmax,softargmax:1,conv_keypoints_m80' in engine
0:00:08.744837143 9785 0x7f38002390 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<second-infer-engine1> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 2]: Use deserialized engine model: /home/wen/Masterarbeit/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_fp16.engine
0:00:21.864412436 9785 0x7f38002390 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<second-infer-engine1> [UID 2]: Load new model:../../../configs/facial_tao/faciallandmark_sgie_config.txt sucessfully
0:00:21.880712334 9785 0x7f38002390 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1161> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
0:00:24.395006682 9785 0x7f38002390 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/home/wen/Masterarbeit/deepstream_tao_apps/models/faciallandmark/facenet.etlt_b1_gpu0_fp16.engine
INFO: [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_1 3x416x736
1 OUTPUT kFLOAT output_bbox/BiasAdd 4x26x46
2 OUTPUT kFLOAT output_cov/Sigmoid 1x26x46
0:00:24.414080113 9785 0x7f38002390 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /home/wen/Masterarbeit/deepstream_tao_apps/models/faciallandmark/facenet.etlt_b1_gpu0_fp16.engine
0:00:24.566083852 9785 0x7f38002390 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-infer-engine1> [UID 1]: Load new model:../../../configs/facial_tao/config_infer_primary_facenet.txt sucessfully
Decodebin child added: source
Decodebin child added: decodebin0
Running...
Decodebin child added: qtdemux0
Decodebin child added: multiqueue0
Decodebin child added: mpeg4vparse0
Decodebin child added: nvv4l2decoder0
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 260
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 260
In cb_newpad
###Decodebin pick nvidia decoder plugin.
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
Frame Number = 0 Face Count = 1
Frame Number = 1 Face Count = 1
...
Frame Number = 6 Face Count = 1
Frame Number = 7 Face Count = 1
H264: Profile = 66, Level = 0
NVMEDIA_ENC: bBlitMode is set to TRUE
Frame Number = 8 Face Count = 1
Frame Number = 9 Face Count = 1
...
Frame Number = 176 Face Count = 1
Frame Number = 177 Face Count = 1
Frame Number = 178 Face Count = 1
End of stream
Returned, stopping playback
Average fps 6.148420
Totally 170 faces are inferred
Deleting pipeline
Environment
TensorRT Version : 8.0
DeepStream Version : 6.0
GPU Type : Nvidia Jetson Nano 2GB
CUDA Version : 10.2
CUDNN Version : 8.2.1
JetPack Version : 4.6