Slowdown in nvdec when upgrading to JetPack 5.1.2

Hello, I’m testing a Jetson upgrade to a newer JetPack version and I have encountered a strange performance issue, I’m testing on 2 identical devices:

• Hardware Platform (Jetson / GPU)
2 x Jetson Xavier NX
• DeepStream Version
6.0 vs 6.3
• JetPack Version (valid for Jetson only)
4.6 vs 5.1.2
• TensorRT Version
8.0.1-1+cuda10.2 vs 8.5.2-1+cuda11.4

• Issue Type( questions, new requirements, bugs)
Slowdown in nvdec usage approx. 30% when upgrading to JetPack 5.1.2.

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

Generate a test video:

gst-launch-1.0 videotestsrc num-buffers=20000 ! 'video/x-raw,width=1920,height=1080' !  nvvideoconvert ! nvv4l2h264enc ! h264parse ! matroskamux ! filesink location='test.mkv'

Run on the JetPack 4.6 device:

time gst-launch-1.0 urisourcebin uri='file:///home/tomas.krupka/test.mkv' ! parsebin ! nvv4l2decoder enable-max-performance=1 ! fakesink

Result:

Executed in   54.89 secs    fish           external
   usr time    5.67 secs    0.00 micros    5.67 secs
   sys time    4.12 secs    0.00 micros    4.12 secs

Run the same command on the JetPack 5.1.2 device:

Executed in   70.96 secs    fish           external
   usr time   16.32 secs    1.35 millis   16.32 secs
   sys time   13.19 secs    0.74 millis   13.19 secs

Bot devices are set to NV Power Mode: MODE_20W_6CORE

Is this expected? Is there some extra setting of the nvdec unit I can check to improve the performance?

6 Likes

Is there any update, please?

1 Like

I’m still seeing the slowdowns, I didn’t figure out any way to make the newer Jetpack version run as fast as the old one.

I tried investigating further with Nsight Systems, but since the decoder is not available there, I looked at nvvideoconvert operations. I tested the following pipeline that decodes a video and does two format conversions:

gst-launch-1.0 urisourcebin uri=file:///home/tomas.krupka/pruhy.mkv ! parsebin ! nvv4l2decoder ! m.sink_0 nvstreammux live-source=false name=m width=768 height=768 batch-size=1 ! nvvideoconvert ! "video/x-raw(memory:NVMM),format=I420" ! nvvideoconvert ! "video/x-raw(memory:NVMM),format=NV12" ! fakesink qos=0 sync=0

I tested with NV Power Mode: MODE_20W_6CORE, and VIC units set to max frequency according to Clocks — Jetson Linux<br/>Developer Guide 34.1 documentation

Here is an example where nvvideoconvert is using VIC units, the conversion takes ~ 2x time on Jetpack 5.1.2:

using VICs,*top: Jetpack 5.1.2 bottom: Jetpack 4.6

2x runtime for nvvideoconvert nvtx ranges:

I also tried switching the operations to using kernels so that I can see more details:

  • nvvideoconvert compute-hw=1, copy-hw=1
  • nvstreammux compute-hw=1

not using VICs, top: Jetpack 5.1.2 bottom: Jetpack 4.6

kernel times, 30% - 50% slowdown:

nvtx times, 20% - 40% slowdown:

I’m using this nsys command on Jetpack 4:
nsys profile -w true -t cuda,nvtx,nvmedia --sample=none --process-scope=process-tree
And on Jetpack 5:
nsys profile -w true -t cuda,nvtx,tegra-accelerators --sample=none --process-scope=process-tree

I really wonder if I’m missing something obvious here.

2 Likes

sorry for the long delay, due to the current lack of equipment to compare different versions of Jetpack.

We will look into this issue and will be back once there is any progress.

We still haven’t managed to figure out where the slowdown is coming from, do you have any updates on this please?

Can you try upgrading to JetPack 5.1.3 ?It is currently unclear whether it is a problem with BSP.

I’m already testing with 5.1.3, forgot to mention this, sorry. The results are the same unfortunately.

This seems to be an issue of JP-5.1.x, You can keep JP-4.6 for now if this problem bothers you.
Or you can seek help from the marketing department. thanks.

Hi,
Please try the command on 4.6 and 5.1.3:

$ gst-launch-1.0 -v filesrc location=test.mkv ! matroskademux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0

And check if there is performance drop on 5.1.3. We see identical result, so it may be the other plugins triggering the issue. Please give it a try.

Hello DaneLLL,
thank you for helping with investigation.

I have ran the pipeline you have provided on two devices with Jetson NX Xavier board. Both are using the following power mode:

NV Power Mode: MODE_20W_6CORE
8
  • The first has Jetpack 4.6 + Deepstream 6.0
  • Second has Jetpack 5.1.3 + Deepstream 6.3

Unfortunately, the results are consistent with the previously reported findings, i.e. there is ~70 fps decrease in performance on the newer Jetpack.

Here are logs of the pipeline for Jetpack 5.1.3

/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 19391, dropped: 0, current: 291,07, average: 290,36
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 19537, dropped: 0, current: 291,20, average: 290,37
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 19683, dropped: 0, current: 290,94, average: 290,38
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 19829, dropped: 0, current: 290,80, average: 290,38
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 19975, dropped: 0, current: 291,21, average: 290,38

And here are logs for the Jetpack 4.6 (faster)

/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 19218, dropped: 0, current: 367,64, average: 364,87
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 19402, dropped: 0, current: 367,40, average: 364,90
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 19586, dropped: 0, current: 367,52, average: 364,92
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 19770, dropped: 0, current: 367,15, average: 364,94
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 19954, dropped: 0, current: 367,07, average: 364,96

Do you know about any other possible factors which could potentially trigger this issues?
Thank you

Erratum: Additionally, i have also tried running with jetson_clocks beforehand. This increases the fps on Jetpack to ~297. Although slightly better, it still remains about 20% slower than 4.6 counterpart.