Deepstream_facelandmark.app faster?

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson Xavie NX)
• DeepStream 6.1
• JetPack Version 5.0.2-b231
• TensorRT Version
• Issue Type( fps is slow )
• How to reproduce the issue ? ( just run the defult tao deepstream-facelandmark.app with following command)

After installing the deepstream 6.1 and tao apps, I build the deepstream-faciallandmark.app. then I run the following command:

./deepstream-faciallandmark-app 3 …/…/…/configs/facial_tao/sample_faciallandmarks_config.txt v4l2:///dev/video0 ./landmarks

May Camera is logitech input is 1920x1080.
The displayed screen seems to be about at between 0.2 fps. Is it the best speed with Jetson xavier nx??

Any suggestions to make deepstream-facelandmark.app at jetson Nx faster?**

log:

./deepstream-faciallandmark-app 3 …/…/…/configs/facial_tao/sample_faciallandmarks_config.txt v4l2:///dev/video0 ./landmarks
Request sink_0 pad from streammux
Now playing: v4l2:///dev/video0

Using winsys: x11
0:00:08.284452113 9307 0xffff44002330 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger: NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 2]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.1/samples/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_int8.engine
INFO: [FullDims Engine Info]: layers num: 4
0 INPUT kFLOAT input_face_images 1x80x80 min: 1x1x80x80 opt: 32x1x80x80 Max: 32x1x80x80
1 OUTPUT kFLOAT conv_keypoints_m80 80x80x80 min: 0 opt: 0 Max: 0
2 OUTPUT kFLOAT softargmax 80x2 min: 0 opt: 0 Max: 0
3 OUTPUT kFLOAT softargmax:1 80 min: 0 opt: 0 Max: 0

ERROR: [TRT]: 3: Cannot find binding of given name: softargmax,softargmax:1,conv_keypoints_m80
0:00:08.352923691 9307 0xffff44002330 WARN nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger: NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1867> [UID = 2]: Could not find output layer ‘softargmax,softargmax:1,conv_keypoints_m80’ in engine
0:00:08.353346511 9307 0xffff44002330 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger: NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2003> [UID = 2]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-6.1/samples/deepstream_tao_apps/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_int8.engine
0:00:09.583761373 9307 0xffff44002330 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus: [UID 2]: Load new model:…/…/…/configs/facial_tao/faciallandmark_sgie_config.txt sucessfully
0:00:09.585165641 9307 0xffff44002330 WARN nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1161> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
0:00:14.821665570 9307 0xffff44002330 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.1/samples/deepstream_tao_apps/models/faciallandmark/facenet.etlt_b1_gpu0_int8.engine
INFO: [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_1 3x416x736
1 OUTPUT kFLOAT output_bbox/BiasAdd 4x26x46
2 OUTPUT kFLOAT output_cov/Sigmoid 1x26x46

0:00:14.890578721 9307 0xffff44002330 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2003> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-6.1/samples/deepstream_tao_apps/models/faciallandmark/facenet.etlt_b1_gpu0_int8.engine
0:00:15.034486837 9307 0xffff44002330 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus: [UID 1]: Load new model:…/…/…/configs/facial_tao/config_infer_primary_facenet.txt sucessfully
Decodebin child added: source
Decodebin child added: decodebin0
Running…
Decodebin child added: nvjpegdec0
In cb_newpad
###Decodebin pick nvidia decoder plugin.
Frame Number = 0 Face Count = 1
Frame Number = 1 Face Count = 1
Frame Number = 2 Face Count = 1
Frame Number = 3 Face Count = 1
##################################################
config_infer_primary_facenet.txt :

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
tlt-model-key=nvidia_tlt
tlt-encoded-model=…/…/models/faciallandmark/facenet.etlt
labelfile-path=labels_facenet.txt
int8-calib-file=…/…/models/faciallandmark/facenet_cal.txt
model-engine-file=…/…/models/faciallandmark/facenet.etlt_b1_gpu0_int8.engine
infer-dims=3;416;736
uff-input-order=0
uff-input-blob-name=input_1
batch-size=1
process-mode=1
model-color-format=0
#0=FP32, 1=INT8, 2=FP16 mode
network-mode=1
num-detected-classes=1
interval=0
gie-unique-id=1
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid

[class-attrs-all]
pre-cluster-threshold=0.2
group-threshold=1
#xSet eps=0.7 and minBoxes for cluster-mode=1(DBSCAN)
eps=0.2
#minBoxes=3

###################################

facial_landmark_sgie.txt :

[property]
gpu-id=0
model-engine-file=…/…/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_int8.engine
tlt-model-key=nvidia_tlt
tlt-encoded-model=…/…/models/faciallandmark/faciallandmarks.etlt
int8-calib-file=…/…/models/faciallandmark/fpenet_cal.txt
#dynamic batch size
batch-size=32
###0=FP32, 1=INT8, 2=FP16 mode
network-mode=1
num-detected-classes=1
output-blob-names=softargmax,softargmax:1,conv_keypoints_m80
#0=Detection 1=Classifier 2=Segmentation 100=other
network-type=100
#xEnable tensor metadata output
output-tensor-meta=1
#1-Primary 2-Secondary
process-mode=2
gie-unique-id=2
operate-on-gie-id=1
net-scale-factor=1.0
offsets=0.0
input-object-min-width=5
input-object-min-height=5
#0=RGB 1=BGR 2=GRAY
model-color-format=2

[class-attrs-all]
threshold=0.0

##############################
sample_facial_landarks:

numLandmarks=80
maxBatchSize=32
inputLayerWidth=80
inputLayerHeight=80

I have a similar problem with my logitech Brio and still could not solve the problem. I think you are having the same problem. Below is the link to my post @yuweiw

Try running with GST_DEBUG_3 in debug mode to get more details:

GST_DEBUG=3 ./deepstream-faciallandmark-app 3 …/…/…/configs/facial_tao/sample_faciallandmarks_config.txt v4l2:///dev/video0 ./landmarks

Thank you. I have run the app with YAML file and played around with live-source=1 and batch-size without success …GST_DEBUG 3 or 6 lead to various errors but they are for me inconclusive.

The deepstream_faciallandmark.app origial benchmark at Nvidia site seems to be about 2000 fps with jetson_xavier_nx. Is it related with usb v4l2 camera, logitech brio in this case? Would it be better with a basler dart 160 (FPS) camera as an example if I use the right camera drivers etc??? Or is it possible to optimize with Brio camera, with a certain gst pipeline for example?

You are running the display pipeline but not performance pipeline. What is your camera’s FPS? What is your monitor’s refresh rate?

Have you enabled the max power of your board when you run performance test?Performance — DeepStream 6.3 Release documentation

Do you know how many HW and SW componenets work in the pipeline? Where did you get such data?

The Logitech Brio Ultra HD 4K camera supports 1. MJPEG (Motion JPEG) and YUY2 (YUV422) formats and delivers on MJPEG 30 fps at 4k.

Yes, the performance of Nvidia Jetson nx was maximized eith following commands:

sudo nvpmodel -m 2 (or 8)
sudo jetson_clocks

The information with 2000 Fps on Jetson Xavier Nx that I mentioned was actually a benchmark of Nvidia for “Facedetection IR” and not for “Facelandmark” so sorry for that.

I do not know how many HW and SW implementations there are. Do mean there are many and it is not predictable, or is that a question that I answer?

I need more specific answers to following questions:

As the Nvidia states that Jetson Nx can infere with resnet as Primary gie about 30 streams of 1080p input at 30 fps (Performance — DeepStream 6.2 Release documentation) and as Facenet is shown as about 2000 fps in the graph ,

I assumed that Jetson NX should bring a performance about 50 fps or better with two gie facelandmark detection?
1-Is my assumtion wrong and my inference fps rate of about 2 fps normal?
2- If not, is it camera related? Can I expect a performance boost of up to 50 Fps with a basler 160 fps camera as an example?
3- I want that my endproduct can infere and display it on the screen in a “fluent” way for the user. If you think it is not realistic, please indicate, as then I must find another solution and my business plan must change, as it is build on my assumption that it should.
4- Do you have suggestions to increase performance, even with brio camera. How should I profile the pipeline?

Best Regards

The model performance is not the pipeline performance. The other components also impact the whole pipeline. Take your command line as example, you enabled the EGL display, so the pipeline was limited to your camera’s frame rate even if there are plenty of capability left for GPU and CPU.

So it is important to know how many components are involved in the pipeline and how these components work.

If you want to do performance test, please use fakesink. The pipeline performance is decided by the slowest component but not the fastest one.

Thank you for your short response. I will try fakesink and close EGL display.

  • Can I prune facelandmarks.etlt model? Would it help?
    -Which profiling method would you suggest me to use for deepstream?
    Best Regards

Please refer to TAO document and forum for model prune.

You can use “tegrastats” to monitor the peformance during the app running. You can measure the latency of DeepStream components with the method DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

The key is to find out the bottleneck in the pipeline.

Thank you for the response, a better pipeline results in better framerates indeed (about 30fps). I measure fps with fpsdisplaysink but for that it needs to be converted from NVMM to normal.

I try to profile with gstshark, but it works with >gst 1.17.

Can I upgrade gst to 1.18 or 1.20, or can deepstream not handle it?

No. If you set the “sink=fakesink” with fpsdisplaysink.

Please consult RidgeRun Blog. It is not provided by Nvidia.

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html#migration-to-newer-gstreamer-version

The following FAQ and trouble shooting tips may help you.

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html#how-to-find-the-performance-bottleneck-in-deepstream
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html#how-do-i-profile-deepstream-pipeline
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_troubleshooting.html#performance

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html#migration-to-newer-gstreamer-version

This way is definitly wrong on a Jetson NX. Even on a new build system it removes following libraries, gnome desktop fails, system falls on tty1, trying to repair libraries requires too much time. I would not suggest to upgrade to gst 1.18 in that way. Does Nvidia suggest any other way? If needed I can open a new topic on migrating to Gst 1.18 on jetson nx, if it is possible

Removing libreoffice-core (1:6.4.7-0ubuntu0.20.04.6) …
dpkg: mutter: dependency problems, but removing anyway as you requested:
gnome-shell depends on mutter (>= 3.36.0); however:
Package mutter is to be removed.
Removing mutter (3.36.9-0ubuntu0.20.04.2) …
Removing zenity (3.32.0-5) …
dpkg: gnome-shell: dependency problems, but removing anyway as you requested:
network-manager-gnome depends on gnome-shell | policykit-1-gnome | polkit-1-auth-agent; however:
Package gnome-shell is to be removed.
Package policykit-1-gnome is not configured yet.
Package polkit-1-auth-agent is not installed.
Package policykit-1-gnome which provides polkit-1-auth-agent is not configured yet.
Package gnome-shell which provides polkit-1-auth-agent is to be removed.
network-manager-gnome depends on gnome-shell | policykit-1-gnome | polkit-1-auth-agent; however:
Package gnome-shell is to be removed.
Package policykit-1-gnome is not configured yet.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

The suggestion is to find out the bottleneck in the piepline with the method we provided.

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html#how-to-find-the-performance-bottleneck-in-deepstream
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html#how-do-i-profile-deepstream-pipeline
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_troubleshooting.html#performance

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.