Run Faciallandmarks model with deepstream on Jetson nano 2GB

Dear all,

I am trying run the inference of gaze estimation model with DeepStream, the code from deepstream_tao_apps/apps/tao_others/deepstream-gaze-app at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)
But when I test with one image it ran very slowly, so that it could hardly run on a video.

Then I test facial landmarks model first, according to deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)

It seemed to go well with this model, but the inference performance can reach only about 2-6 FPS in mode FP16. This is far from the performance tested by the official.

Is there anything wrong with my configuration? The terminal output and system environment see below:

wen@wen-desktop:~/Masterarbeit/deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app$ ./deepstream-faciallandmark-app 1 ../../../configs/facial_tao/sample_faciallandmarks_config.txt file:///home/wen/Masterarbeit/face/video_test.mp4 ./landmarks_rot
Request sink_0 pad from streammux
####+++OUT file ./landmarks_rot.264
Now playing: file:///home/wen/Masterarbeit/face/video_test.mp4
Opening in BLOCKING MODE 
0:00:08.718553375  9785   0x7f38002390 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<second-infer-engine1> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 2]: deserialized trt engine from :/home/wen/Masterarbeit/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_fp16.engine
INFO: [FullDims Engine Info]: layers num: 4
0   INPUT  kFLOAT input_face_images 1x80x80         min: 1x1x80x80       opt: 32x1x80x80      Max: 32x1x80x80      
1   OUTPUT kFLOAT conv_keypoints_m80 80x80x80        min: 0               opt: 0               Max: 0               
2   OUTPUT kFLOAT softargmax      80x2            min: 0               opt: 0               Max: 0               
3   OUTPUT kFLOAT softargmax:1    80              min: 0               opt: 0               Max: 0               

ERROR: [TRT]: 3: Cannot find binding of given name: softargmax,softargmax:1,conv_keypoints_m80
0:00:08.744800424  9785   0x7f38002390 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<second-infer-engine1> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1868> [UID = 2]: Could not find output layer 'softargmax,softargmax:1,conv_keypoints_m80' in engine
0:00:08.744837143  9785   0x7f38002390 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<second-infer-engine1> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 2]: Use deserialized engine model: /home/wen/Masterarbeit/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_fp16.engine
0:00:21.864412436  9785   0x7f38002390 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<second-infer-engine1> [UID 2]: Load new model:../../../configs/facial_tao/faciallandmark_sgie_config.txt sucessfully
0:00:21.880712334  9785   0x7f38002390 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1161> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
0:00:24.395006682  9785   0x7f38002390 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/home/wen/Masterarbeit/deepstream_tao_apps/models/faciallandmark/facenet.etlt_b1_gpu0_fp16.engine
INFO: [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT input_1         3x416x736       
1   OUTPUT kFLOAT output_bbox/BiasAdd 4x26x46         
2   OUTPUT kFLOAT output_cov/Sigmoid 1x26x46         

0:00:24.414080113  9785   0x7f38002390 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-infer-engine1> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /home/wen/Masterarbeit/deepstream_tao_apps/models/faciallandmark/facenet.etlt_b1_gpu0_fp16.engine
0:00:24.566083852  9785   0x7f38002390 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-infer-engine1> [UID 1]: Load new model:../../../configs/facial_tao/config_infer_primary_facenet.txt sucessfully
Decodebin child added: source
Decodebin child added: decodebin0
Running...
Decodebin child added: qtdemux0
Decodebin child added: multiqueue0
Decodebin child added: mpeg4vparse0
Decodebin child added: nvv4l2decoder0
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 260 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 260 
In cb_newpad
###Decodebin pick nvidia decoder plugin.
NvMMLiteOpen : Block : BlockType = 4 
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4 
Frame Number = 0 Face Count = 1
Frame Number = 1 Face Count = 1
...
Frame Number = 6 Face Count = 1
Frame Number = 7 Face Count = 1
H264: Profile = 66, Level = 0 
NVMEDIA_ENC: bBlitMode is set to TRUE 
Frame Number = 8 Face Count = 1
Frame Number = 9 Face Count = 1
...
Frame Number = 176 Face Count = 1
Frame Number = 177 Face Count = 1
Frame Number = 178 Face Count = 1
End of stream
Returned, stopping playback
Average fps 6.148420
Totally 170 faces are inferred
Deleting pipeline

Environment

TensorRT Version : 8.0
DeepStream Version : 6.0
GPU Type : Nvidia Jetson Nano 2GB
CUDA Version : 10.2
CUDNN Version : 8.2.1
JetPack Version : 4.6

Hi,

First, we have a new package release.
It’s always recommended to upgrade your device to the latest for better experiences.

Back to your question, do you want to reproduce the performance below?
https://docs.nvidia.com/tao/tao-toolkit/text/overview.html#pre-trained-models

Please note that the table is tested by Nano 4GiB.
It is measured by trtexec rather than the whole pipeline.

For a 2GiB device, you can try the below command to get maximum performance first.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

Hi,

Thank you for your answer. I have already started the max performance mode and jetson clock. But it still have only 8 FPS for facial landmarks model.

And for GazeNet from deepstream_tao_apps/apps/tao_others/deepstream-gaze-app at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com), do you have any test on Jetson nano 2GB? I always get stuck after the first few frames of interence.

Thanks for your update.
We are going to get the performance of Nano and then share more information with you.

Hi,

Could you also try the following command to get the inference performance on your device?

1. Download the tao-converter tool from the below page:

https://docs.nvidia.com/tao/tao-toolkit/text/tensorrt.html#installing-the-tao-converter

2. Profiling

$ wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/fpenet/versions/deployable_v3.0/files/model.etlt
$ ./tao-converter-jp46-trt8.0.1.6/tao-converter -k nvidia_tlt -p input_face_images,1x1x80x80,1x1x80x80,1x1x80x80 -t fp16 -e model.etlt_b1_gpu0_fp16.engine model.etlt 
$ /usr/src/tensorrt/bin/trtexec --loadEngine=model.etlt_b1_gpu0_fp16.engine

Thanks.

Hi,
thank you for the help.

I find the solution from this post Failed to load GazeNet model in TRT - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

The default configure is too heavy for Jetson Nano, need to be modified.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.