Deepstream_facelandmark.app faster?

erence · March 10, 2023, 1:57pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson Xavie NX)
• DeepStream 6.1
• JetPack Version 5.0.2-b231
• TensorRT Version
• Issue Type( fps is slow )
• How to reproduce the issue ? ( just run the defult tao deepstream-facelandmark.app with following command)

After installing the deepstream 6.1 and tao apps, I build the deepstream-faciallandmark.app. then I run the following command:

./deepstream-faciallandmark-app 3 …/…/…/configs/facial_tao/sample_faciallandmarks_config.txt v4l2:///dev/video0 ./landmarks

May Camera is logitech input is 1920x1080.
The displayed screen seems to be about at between 0.2 fps. Is it the best speed with Jetson xavier nx??

Any suggestions to make deepstream-facelandmark.app at jetson Nx faster?**

log:

./deepstream-faciallandmark-app 3 …/…/…/configs/facial_tao/sample_faciallandmarks_config.txt v4l2:///dev/video0 ./landmarks
Request sink_0 pad from streammux
Now playing: v4l2:///dev/video0

Using winsys: x11
0:00:08.284452113 9307 0xffff44002330 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger: NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 2]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.1/samples/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_int8.engine
INFO: [FullDims Engine Info]: layers num: 4
0 INPUT kFLOAT input_face_images 1x80x80 min: 1x1x80x80 opt: 32x1x80x80 Max: 32x1x80x80
1 OUTPUT kFLOAT conv_keypoints_m80 80x80x80 min: 0 opt: 0 Max: 0
2 OUTPUT kFLOAT softargmax 80x2 min: 0 opt: 0 Max: 0
3 OUTPUT kFLOAT softargmax:1 80 min: 0 opt: 0 Max: 0

ERROR: [TRT]: 3: Cannot find binding of given name: softargmax,softargmax:1,conv_keypoints_m80
0:00:08.352923691 9307 0xffff44002330 WARN nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger: NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1867> [UID = 2]: Could not find output layer ‘softargmax,softargmax:1,conv_keypoints_m80’ in engine
0:00:08.353346511 9307 0xffff44002330 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger: NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2003> [UID = 2]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-6.1/samples/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_int8.engine
0:00:09.583761373 9307 0xffff44002330 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus: [UID 2]: Load new model:…/…/…/configs/facial_tao/faciallandmark_sgie_config.txt sucessfully
0:00:09.585165641 9307 0xffff44002330 WARN nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1161> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
0:00:14.821665570 9307 0xffff44002330 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.1/samples/deepstream_tao_apps/models/faciallandmark/facenet.etlt_b1_gpu0_int8.engine
INFO: [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_1 3x416x736
1 OUTPUT kFLOAT output_bbox/BiasAdd 4x26x46
2 OUTPUT kFLOAT output_cov/Sigmoid 1x26x46

0:00:14.890578721 9307 0xffff44002330 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2003> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-6.1/samples/deepstream_tao_apps/models/faciallandmark/facenet.etlt_b1_gpu0_int8.engine
0:00:15.034486837 9307 0xffff44002330 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus: [UID 1]: Load new model:…/…/…/configs/facial_tao/config_infer_primary_facenet.txt sucessfully
Decodebin child added: source
Decodebin child added: decodebin0
Running…
Decodebin child added: nvjpegdec0
In cb_newpad
###Decodebin pick nvidia decoder plugin.
Frame Number = 0 Face Count = 1
Frame Number = 1 Face Count = 1
Frame Number = 2 Face Count = 1
Frame Number = 3 Face Count = 1
##################################################
config_infer_primary_facenet.txt :

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
tlt-model-key=nvidia_tlt
tlt-encoded-model=…/…/models/faciallandmark/facenet.etlt
labelfile-path=labels_facenet.txt
int8-calib-file=…/…/models/faciallandmark/facenet_cal.txt
model-engine-file=…/…/models/faciallandmark/facenet.etlt_b1_gpu0_int8.engine
infer-dims=3;416;736
uff-input-order=0
uff-input-blob-name=input_1
batch-size=1
process-mode=1
model-color-format=0
#0=FP32, 1=INT8, 2=FP16 mode
network-mode=1
num-detected-classes=1
interval=0
gie-unique-id=1
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid

[class-attrs-all]
pre-cluster-threshold=0.2
group-threshold=1
#xSet eps=0.7 and minBoxes for cluster-mode=1(DBSCAN)
eps=0.2
#minBoxes=3

###################################

facial_landmark_sgie.txt :

[property]
gpu-id=0
model-engine-file=…/…/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_int8.engine
tlt-model-key=nvidia_tlt
tlt-encoded-model=…/…/models/faciallandmark/faciallandmarks.etlt
int8-calib-file=…/…/models/faciallandmark/fpenet_cal.txt
#dynamic batch size
batch-size=32
###0=FP32, 1=INT8, 2=FP16 mode
network-mode=1
num-detected-classes=1
output-blob-names=softargmax,softargmax:1,conv_keypoints_m80
#0=Detection 1=Classifier 2=Segmentation 100=other
network-type=100
#xEnable tensor metadata output
output-tensor-meta=1
#1-Primary 2-Secondary
process-mode=2
gie-unique-id=2
operate-on-gie-id=1
net-scale-factor=1.0
offsets=0.0
input-object-min-width=5
input-object-min-height=5
#0=RGB 1=BGR 2=GRAY
model-color-format=2

[class-attrs-all]
threshold=0.0

##############################
sample_facial_landarks:

numLandmarks=80
maxBatchSize=32
inputLayerWidth=80
inputLayerHeight=80

uersoy · March 11, 2023, 11:13pm

I have a similar problem with my logitech Brio and still could not solve the problem. I think you are having the same problem. Below is the link to my post @yuweiw

Try running with GST_DEBUG_3 in debug mode to get more details:

GST_DEBUG=3 ./deepstream-faciallandmark-app 3 …/…/…/configs/facial_tao/sample_faciallandmarks_config.txt v4l2:///dev/video0 ./landmarks

erence · March 12, 2023, 3:40pm

Thank you. I have run the app with YAML file and played around with live-source=1 and batch-size without success …GST_DEBUG 3 or 6 lead to various errors but they are for me inconclusive.

The deepstream_faciallandmark.app origial benchmark at Nvidia site seems to be about 2000 fps with jetson_xavier_nx. Is it related with usb v4l2 camera, logitech brio in this case? Would it be better with a basler dart 160 (FPS) camera as an example if I use the right camera drivers etc??? Or is it possible to optimize with Brio camera, with a certain gst pipeline for example?

Fiona.Chen · March 13, 2023, 1:38am

You are running the display pipeline but not performance pipeline. What is your camera’s FPS? What is your monitor’s refresh rate?

Have you enabled the max power of your board when you run performance test?Performance — DeepStream 6.3 Release documentation

Do you know how many HW and SW componenets work in the pipeline? Where did you get such data?

erence · March 13, 2023, 1:40pm

The Logitech Brio Ultra HD 4K camera supports 1. MJPEG (Motion JPEG) and YUY2 (YUV422) formats and delivers on MJPEG 30 fps at 4k.

Yes, the performance of Nvidia Jetson nx was maximized eith following commands:

sudo nvpmodel -m 2 (or 8)
sudo jetson_clocks

The information with 2000 Fps on Jetson Xavier Nx that I mentioned was actually a benchmark of Nvidia for “Facedetection IR” and not for “Facelandmark” so sorry for that.

I do not know how many HW and SW implementations there are. Do mean there are many and it is not predictable, or is that a question that I answer?

I need more specific answers to following questions:

As the Nvidia states that Jetson Nx can infere with resnet as Primary gie about 30 streams of 1080p input at 30 fps (Performance — DeepStream 6.2 Release documentation) and as Facenet is shown as about 2000 fps in the graph ,

I assumed that Jetson NX should bring a performance about 50 fps or better with two gie facelandmark detection?
1-Is my assumtion wrong and my inference fps rate of about 2 fps normal?
2- If not, is it camera related? Can I expect a performance boost of up to 50 Fps with a basler 160 fps camera as an example?
3- I want that my endproduct can infere and display it on the screen in a “fluent” way for the user. If you think it is not realistic, please indicate, as then I must find another solution and my business plan must change, as it is build on my assumption that it should.
4- Do you have suggestions to increase performance, even with brio camera. How should I profile the pipeline?

Best Regards

Fiona.Chen · March 14, 2023, 3:28am

The model performance is not the pipeline performance. The other components also impact the whole pipeline. Take your command line as example, you enabled the EGL display, so the pipeline was limited to your camera’s frame rate even if there are plenty of capability left for GPU and CPU.

So it is important to know how many components are involved in the pipeline and how these components work.

If you want to do performance test, please use fakesink. The pipeline performance is decided by the slowest component but not the fastest one.

erence · March 14, 2023, 4:44pm

Thank you for your short response. I will try fakesink and close EGL display.

Can I prune facelandmarks.etlt model? Would it help?
-Which profiling method would you suggest me to use for deepstream?
Best Regards

Fiona.Chen · March 21, 2023, 11:01am

Please refer to TAO document and forum for model prune.

You can use “tegrastats” to monitor the peformance during the app running. You can measure the latency of DeepStream components with the method DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

The key is to find out the bottleneck in the pipeline.

erence · March 26, 2023, 6:46am

Thank you for the response, a better pipeline results in better framerates indeed (about 30fps). I measure fps with fpsdisplaysink but for that it needs to be converted from NVMM to normal.

I try to profile with gstshark, but it works with >gst 1.17.

Can I upgrade gst to 1.18 or 1.20, or can deepstream not handle it?

Fiona.Chen · March 27, 2023, 5:19am

No. If you set the “sink=fakesink” with fpsdisplaysink.

Please consult RidgeRun Blog. It is not provided by Nvidia.

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html#migration-to-newer-gstreamer-version

The following FAQ and trouble shooting tips may help you.

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html#how-to-find-the-performance-bottleneck-in-deepstream
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html#how-do-i-profile-deepstream-pipeline
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_troubleshooting.html#performance

erence · March 28, 2023, 10:23am

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html#migration-to-newer-gstreamer-version

This way is definitly wrong on a Jetson NX. Even on a new build system it removes following libraries, gnome desktop fails, system falls on tty1, trying to repair libraries requires too much time. I would not suggest to upgrade to gst 1.18 in that way. Does Nvidia suggest any other way? If needed I can open a new topic on migrating to Gst 1.18 on jetson nx, if it is possible

Removing libreoffice-core (1:6.4.7-0ubuntu0.20.04.6) …
dpkg: mutter: dependency problems, but removing anyway as you requested:
gnome-shell depends on mutter (>= 3.36.0); however:
Package mutter is to be removed.
Removing mutter (3.36.9-0ubuntu0.20.04.2) …
Removing zenity (3.32.0-5) …
dpkg: gnome-shell: dependency problems, but removing anyway as you requested:
network-manager-gnome depends on gnome-shell | policykit-1-gnome | polkit-1-auth-agent; however:
Package gnome-shell is to be removed.
Package policykit-1-gnome is not configured yet.
Package polkit-1-auth-agent is not installed.
Package policykit-1-gnome which provides polkit-1-auth-agent is not configured yet.
Package gnome-shell which provides polkit-1-auth-agent is to be removed.
network-manager-gnome depends on gnome-shell | policykit-1-gnome | polkit-1-auth-agent; however:
Package gnome-shell is to be removed.
Package policykit-1-gnome is not configured yet.

Fiona.Chen · March 28, 2023, 10:40am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

The suggestion is to find out the bottleneck in the piepline with the method we provided.

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html#how-to-find-the-performance-bottleneck-in-deepstream
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html#how-do-i-profile-deepstream-pipeline
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_troubleshooting.html#performance

system · April 24, 2023, 2:38pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DeepStream SDK FAQ DeepStream SDK	46	62588	February 14, 2025
DeepStream: incorrect camera parameters provided, please provide supported resolution and frame rate DeepStream SDK camera	11	2360	May 18, 2022
FaceDetect Pre-Trained model implementation using DS DeepStream SDK	26	914	July 30, 2023
Deepstream and JetPack 3.3 DeepStream SDK	33	5014	January 29, 2019
The DeepStream image nvcr.io/nvidia/deepstream-l4t:5.1-21.02-samples pulled from NGC to my NVIDIA NX failed to start any application DeepStream SDK	34	3092	October 14, 2021
Windows through SSH Connect to ubuntu DeepStream SDK	20	600	March 22, 2024
Any complete installation guide for "deepstream_pose_estimation"? DeepStream SDK	6	2314	February 7, 2022
Some question about Deep stream 5 DeepStream SDK	42	1782	October 12, 2021
Jetson Nano Camera with remote Desktop on pipeline IP camera RTSP Jetson Nano	38	5374	October 18, 2021
Integrating Tao Models (detectnet_v2) into Deepstream SDK TAO Toolkit tao , deepstream , jetson-nano	11	970	March 24, 2023

Deepstream_facelandmark.app faster?

Any suggestions to make deepstream-facelandmark.app at jetson Nx faster?**

Related topics