Deepstream Optimization

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) H100
• DeepStream Version 6.4
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi, we have our use the Deepstream SDK to develop and modify a version that suits our use case. However, we found out that the application accommodates lesser amount of CCTVs comparing to our expectations. Thus, i tried running the LPR application from here and the average FPS that we got is approximately 850 but the benchmark result from here is giving 2801.

There are a few questions below:

  1. May I know is the LPR sample application posted in the Github exactly the one that is done for the benchmark?
  2. If there are any changes, do you mind sharing what are the changes you made. (e.g. batch size of models, inference interval, etc.)
  3. Besides, for achieving 2801 FPS averagely, what is the GPU memory usage percentage and also the decoder utilization percentage.
  4. I noticed from the Performance page, there is a section as below for the Hopper architecture:
$ sudo service display-manager stop
#Make sure no process is running on GPU i.e. Xorg or trition server etc
$ sudo pkill -9 Xorg
#Remove kernel modules
$ sudo rmmod nvidia_drm nvidia_modeset nvidia
#Load Modules with Regkeys
$ sudo modprobe nvidia NVreg_RegistryDwords="RMDebugOverridePerRunlistChannelRam = 1;RMIncreaseRsvdMemorySizeMB = 1024;RMDisableChIdIsolation = 0x1;RmGspFirmwareHeapSizeMB = 256"
$ sudo service display-manager start

May I know what is the reason for removing kernel modules and loading them with Regkeys. Could you enlighten me about the details of these doings and what are the consequences of these actions? Besides, after loading with the new modules, if i would like to revert back with the original settings, what are the commands. Besides, in the performance benchmark, do you guys did this part?

Any advice would be much appreciated.

Can you try downloading below config file inside deepstream-lpr-app directory.
Make sure you are using LPD model and run application config file with deepstream-app. i.e. deepstream-app -c app_config_file.txt
lpd_us_config.txt (3.3 KB)
app_config_file.txt (1.9 KB)

Observed GPU usage was 90%+ and average nvdec utilization was around 30%.

Did you change anything in the code?

./deepstream-lpr-app [1:us model|2: ch_model] [1:file sink|2:fakesink|3:display sink] [0:ROI disable|0:ROI enable] [infer|triton|tritongrpc] <In mp4 filename> <in mp4 filename> ... <out H264 filename>

This is the format that required to run deepstream app. In order to run deepstream app did you change the parse arg in the code?

@jason.cham

The sample NVIDIA-AI-IOT/deepstream_lpr_app: Sample app code for LPR deployment on DeepStream (github.com) does not support performance mode. Please refer to the Performance — DeepStream documentation 6.4 documentation (nvidia.com) for what is the performance mode.

You need to run with performance mode to get the performance data.

@Fiona.Chen

Yup, we have replicated as what you have mentioned and the performance is close to what is written in the benchmark documentation. Since the performance mode provides a better and faster inference result, may i know why and how in detail that it performs better when we run in performance mode. Besides, is it suggested to run in performance mode for deployment purposes?

It depends on your use case. If you don’t need OSD, display, encoding, message broker,…ect in your use case, you can use performance mode in your deployment.

Alright, thank you so much for the input.

@Fiona.Chen

May i know whats the directory for the deepstream application in performance mode? I would need to check the source code.

Besides, may i know why the lpr app in C does not provide inference as fast as the one in performance mode. And is it possible for the application in C to inference as fast as the one in performance mode.

The source code is in /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-app

All the samples are just samples to demonstrate how to use DeepStream APIs. It is no meaning to implement the same function in every sample.

Yes. The key is to disable any function which you don’t want to use and enable the run ASAP by setting the sink as “fakesink” and setting the “sync” property of fakesink as 0. E.G. deepstream-app uses configuration parameters in configuration file to turn off the OSD and tiler functions(Performance — DeepStream documentation 6.4 documentation), the lpr sample can also implement it by removing nvosd and nvmultistreamtiler from the pipeline.

And this Troubleshooting — DeepStream documentation 6.4 documentation may also help you.