Performance degradation in DS4.0 example compared to DS3.0

Hi,

We are testing DeepStream SDK on the AGX Xavier. We had previously implemented a DS3.0 graph which supported processing multiple camera streams (up to 4) at 1080p@60 fps. With the latest DS4.0 however, we are able to get only 1080p@30 fps for 4 cameras. Surprisingly we get 1080p@30 fps when we use just a single camera as well. Our camera, supports up to 120 fps.

There are only two major differences between our two test cases :

  1. Earlier we tested using a UYVY camera which does not use nvidia ISP. Currently we are testing with a camera that uses libargus/nvidia ISP.
  2. Our earlier test case used DeepStream SDK version 3.0. Right now we are testing in DS4.0

To improve performance, we have already set nvpmodel to 0 and run jetson_clocks as well. I highly doubt that the change from v4l2src to nvarguscamerasrc could be a reason for the fps drop. So, our guess is that fps is locked to 30 somewhere in the latest DS 4.0 version?

We are still gathering more data. If you can give us any ideas, that would be helpful for us to debug faster.
deepstream_app_log.txt (10.5 KB)
ecam_20_4cam_60fps.txt (3.3 KB)
tegrastats_4cam.txt (14.9 KB)

A quick update. It appears that running nvpmodel and jetson_clocks is not enough. We have to set higher clocks for VI, ISP, VIC and EMC to achieve 1080p@30 fps for even a single camera. The commands we use are:
MAX_VI_RATE=(cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate) MAX_ISP_RATE=(cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate)
MAX_VIC_RATE=(cat /sys/kernel/debug/bpmp/debug/clk/vic/max_rate) MAX_EMC_RATE=(cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate)

echo $MAX_VI_RATE > /sys/kernel/debug/bpmp/debug/clk/vi/rate
echo $MAX_ISP_RATE > /sys/kernel/debug/bpmp/debug/clk/isp/rate
echo $MAX_VIC_RATE > /sys/kernel/debug/bpmp/debug/clk/vic/rate
echo $MAX_EMC_RATE > /sys/kernel/debug/bpmp/debug/clk/emc/rate

echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/vic/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked

I’ve also attached the configuration file we used (ecam_20_4cam_60fps.txt) as well as application log (deepstream_app_log.txt) with the first message. The cpu/gpu usage logs when running 4 cameras in DS4.0 is also attached (tegrastats_4cam.txt).

According to the application log (deepstream_app_log.txt) you can see that 120 fps mode is set in the camera. However, only 30 fps is the final output from deepstream. We’ll make sure the camera is indeed streaming 120fps using gst-launch. Just to make sure.

CPU usage is roughly 25% across 6 cores.
GPU usage varies a lot between 10-80%. So, I’m sure GPU is underutilized.

Hi nvidia,

We have confirmed that the camera streams at 120 fps when deepstream plugins are not used. We also noticed that there are two plugins available for colorspace conversion:

  1. nvvidconv - default in jetson. uses VIC.
  2. nvvideoconvert - support for dGPU or VIC based conversion. provided by DS4.0

Earlier in DS3.0, we were using nvvidconv without issues. Now we are unable to use the nvvidconv plugin in the new DS4.0 examples. We suspect that the new plugin could be a reason for the fps bottleneck. Do you have any thoughts?

The nvvidconv is changed to nvvideoconvert in DS4.0. All the changes between our previous version and 4.0 will be published in our DS4.0 migration guide.

We will take a look at the issues mentioned above.

Thanks for the update, Shah. The migration guide would be helpful to users. We look forward to its release.