Run Faciallandmarks model with deepstream on Jetson nano 2GB

Wenzy · March 23, 2022, 12:06pm

Dear all,

I am trying run the inference of gaze estimation model with DeepStream, the code from deepstream_tao_apps/apps/tao_others/deepstream-gaze-app at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)
But when I test with one image it ran very slowly, so that it could hardly run on a video.

Then I test facial landmarks model first, according to deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)

It seemed to go well with this model, but the inference performance can reach only about 2-6 FPS in mode FP16. This is far from the performance tested by the official.

Is there anything wrong with my configuration? The terminal output and system environment see below:

wen@wen-desktop:~/Masterarbeit/deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app$ ./deepstream-faciallandmark-app 1 ../../../configs/facial_tao/sample_faciallandmarks_config.txt file:///home/wen/Masterarbeit/face/video_test.mp4 ./landmarks_rot
Request sink_0 pad from streammux
####+++OUT file ./landmarks_rot.264
Now playing: file:///home/wen/Masterarbeit/face/video_test.mp4
Opening in BLOCKING MODE 
0:00:08.718553375  9785   0x7f38002390 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<second-infer-engine1> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 2]: deserialized trt engine from :/home/wen/Masterarbeit/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_fp16.engine
INFO: [FullDims Engine Info]: layers num: 4
0   INPUT  kFLOAT input_face_images 1x80x80         min: 1x1x80x80       opt: 32x1x80x80      Max: 32x1x80x80      
1   OUTPUT kFLOAT conv_keypoints_m80 80x80x80        min: 0               opt: 0               Max: 0               
2   OUTPUT kFLOAT softargmax      80x2            min: 0               opt: 0               Max: 0               
3   OUTPUT kFLOAT softargmax:1    80              min: 0               opt: 0               Max: 0               

ERROR: [TRT]: 3: Cannot find binding of given name: softargmax,softargmax:1,conv_keypoints_m80
0:00:08.744800424  9785   0x7f38002390 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<second-infer-engine1> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1868> [UID = 2]: Could not find output layer 'softargmax,softargmax:1,conv_keypoints_m80' in engine
0:00:08.744837143  9785   0x7f38002390 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<second-infer-engine1> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 2]: Use deserialized engine model: /home/wen/Masterarbeit/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_fp16.engine
0:00:21.864412436  9785   0x7f38002390 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<second-infer-engine1> [UID 2]: Load new model:../../../configs/facial_tao/faciallandmark_sgie_config.txt sucessfully
0:00:21.880712334  9785   0x7f38002390 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1161> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
0:00:24.395006682  9785   0x7f38002390 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/home/wen/Masterarbeit/deepstream_tao_apps/models/faciallandmark/facenet.etlt_b1_gpu0_fp16.engine
INFO: [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT input_1         3x416x736       
1   OUTPUT kFLOAT output_bbox/BiasAdd 4x26x46         
2   OUTPUT kFLOAT output_cov/Sigmoid 1x26x46         

0:00:24.414080113  9785   0x7f38002390 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /home/wen/Masterarbeit/deepstream_tao_apps/models/faciallandmark/facenet.etlt_b1_gpu0_fp16.engine
0:00:24.566083852  9785   0x7f38002390 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-infer-engine1> [UID 1]: Load new model:../../../configs/facial_tao/config_infer_primary_facenet.txt sucessfully
Decodebin child added: source
Decodebin child added: decodebin0
Running...
Decodebin child added: qtdemux0
Decodebin child added: multiqueue0
Decodebin child added: mpeg4vparse0
Decodebin child added: nvv4l2decoder0
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 260 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 260 
In cb_newpad
###Decodebin pick nvidia decoder plugin.
NvMMLiteOpen : Block : BlockType = 4 
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4 
Frame Number = 0 Face Count = 1
Frame Number = 1 Face Count = 1
...
Frame Number = 6 Face Count = 1
Frame Number = 7 Face Count = 1
H264: Profile = 66, Level = 0 
NVMEDIA_ENC: bBlitMode is set to TRUE 
Frame Number = 8 Face Count = 1
Frame Number = 9 Face Count = 1
...
Frame Number = 176 Face Count = 1
Frame Number = 177 Face Count = 1
Frame Number = 178 Face Count = 1
End of stream
Returned, stopping playback
Average fps 6.148420
Totally 170 faces are inferred
Deleting pipeline

Environment

TensorRT Version : 8.0
DeepStream Version : 6.0
GPU Type : Nvidia Jetson Nano 2GB
CUDA Version : 10.2
CUDNN Version : 8.2.1
JetPack Version : 4.6

AastaLLL · March 24, 2022, 3:56am

Hi,

First, we have a new package release.
It’s always recommended to upgrade your device to the latest for better experiences.

Back to your question, do you want to reproduce the performance below?
https://docs.nvidia.com/tao/tao-toolkit/text/overview.html#pre-trained-models

Please note that the table is tested by Nano 4GiB.
It is measured by trtexec rather than the whole pipeline.

For a 2GiB device, you can try the below command to get maximum performance first.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

Wenzy · March 24, 2022, 3:30pm

Hi,

Thank you for your answer. I have already started the max performance mode and jetson clock. But it still have only 8 FPS for facial landmarks model.

And for GazeNet from deepstream_tao_apps/apps/tao_others/deepstream-gaze-app at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com), do you have any test on Jetson nano 2GB? I always get stuck after the first few frames of interence.

AastaLLL · March 28, 2022, 5:38am

Thanks for your update.
We are going to get the performance of Nano and then share more information with you.

AastaLLL · March 28, 2022, 6:08am

Hi,

Could you also try the following command to get the inference performance on your device?

1. Download the tao-converter tool from the below page:

https://docs.nvidia.com/tao/tao-toolkit/text/tensorrt.html#installing-the-tao-converter

2. Profiling

$ wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/fpenet/versions/deployable_v3.0/files/model.etlt
$ ./tao-converter-jp46-trt8.0.1.6/tao-converter -k nvidia_tlt -p input_face_images,1x1x80x80,1x1x80x80,1x1x80x80 -t fp16 -e model.etlt_b1_gpu0_fp16.engine model.etlt 
$ /usr/src/tensorrt/bin/trtexec --loadEngine=model.etlt_b1_gpu0_fp16.engine

Thanks.

Wenzy · April 4, 2022, 10:05am

Hi,
thank you for the help.

I find the solution from this post Failed to load GazeNet model in TRT - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

The default configure is too heavy for Jetson Nano, need to be modified.

system · April 26, 2022, 2:53am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.