GPU keep load 99% when running the deepstream_parallel_inference_app

I try to run the code of deepstream_parallel_inference_app on jetson xavier nx, and found gpu to be kept 99% load when using two models for detection.
Here I provide the setting of deepstream_app_config.yml as below, pls help to confirm whether the setting are correct:

I also attached the GUI snapshoot as shown below:

The configuration looks fine. Please use trtexec tool to measure the model performance on your device before you apply the model with DeepStream.

Pls see the snapshoot as below:

The “model_0” GPU compute time is mean 15ms max 108ms, the “model_1” GPU compute time is mean 45ms max 74ms, they are overloaded for the Xavier NX if you want the pipeline works on 25 FPS or 30 FPS.

Not clear your mean. If I use only one model, the device works fine. But if use two models, it cause high load… Does it mean we can not use two models inference on xaiver nx?If possible, could you give some advices, thanks!

The “one model” is the model_0, right? It is OK since the compute time is 15ms for average.
If you use the model_1, even only this one, the GPU loading will be very high.

You can’t use these two models together. If you change the model_1 to a smaller model whose compute time is 10ms average, they may work together well.

The models are the bottleneck. Especially model_1, it is too heavy for Xavier NX. Please choose model according to the hardware spec. Jetson Modules, Support, Ecosystem, and Lineup | NVIDIA Developer

Thanks for your reply.
I’m not clear why these two models have such different compute time. these two models are both yolox-s model, just with different num-classes…
As your reply, does it mean we need to change the device with bigger compute ablility such as Orin NX?

You can compare the two models’ layers to check which layers are different to identify different class numbers. They can’t be exactly the same.

If you must use these two models, the answer is “Yes”. You need to change your hardware to meet your models.

Ok, many thanks for your help !

BTW, have you enabled the max power model when you run the case?
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Performance.html#id42

Yes, I have set the mode 20W with 6 core

So the hardware is the limitation for your models, you may need to consider other devices.

Thanks. If I use two models with smiliar by 15ms inference time, is it possible to deploy on this xavier nx?

What is your expected performance?

Use two models for detection, and provide the real-time video streaming, and keep the load at safety level.

What is the original FPS of your video stream? If it is 10 FPS, it is OK to use such two models(with smiliar by 15ms inference time) with the streams.

It is no meaning to mention “real-time” without any performance metrics. There is a big gap between the 4k@60fps “real-time” and the 720p@25fps “real-time”, right?

my camera is 25fps with 1080P, we need to detect the tager with two models and combine the result to push real-time streaming.

How many cameras? What kind of cameras? USB, CSI, RTSP, …? What is the vidoe format?