Jetpack 5.1.2 performance degredation

Hi, in testing our application on Jetpack 5.1.2 on our Xavier NX we noticed severe performance degredation when copying DMA memory (via NvBufSurfTransform()). In searching the forums there seem to have been several people who have also noticed this issue but have never received any definitive answer as to why. ( Jetpack 5.1.2 performance degradation vs jetpack 4.5 - #3 by AastaLLL

https://forums.developer.nvidia.com/t/bad-memory-performance-on-jetpack-5-0-2-5-10-104-tegra/228363/29

GStreamer NVIDIA elements are slower in Jetpack 5.0.2 than in Jetpack 4.6. Is it normal? )

Following are our measured results in combination to what has been posted previously.
JP4: 100 iterations of nvbuffertransform is ~130ms (1.3ms per)
JP5: 100 iterations of Nvbufsurftransform is ~370 (3.7ms per)
With VIC pinned at max on JP5 it’s around 150ms. (Still slower).
The memory locations are created once and copied from A to B over and over.
Frame size is 1080p at YUV420. Filter is not set on the transform.
I’m sure you can create your own test to try replicate these findings yourself.

We require to use the Xavier NX due to the more capable NVENC and NVDEC capabilities and so can’t upgrade to Orin nor can we use Jetpack 6. We wish to use Jetpack 5 instead of 4 due to Ubuntu 18 being EOL.

Any comments?

Hi,

Have you tried to apply the suggestion in the topics you listed?
For example: using the huge page mentioned below topic:

If the suggestions do not work for your use case, could you share a reproducible source so we can test it in our environment?

Thanks.

using the huge page mentioned below topic

No affect for us, we’re not copying to/from that page with DMA.

could you share a reproducible source

I include in this post a gutted version of video_decode_main.cpp from the 00_video_decode sample (With the extension changed from cpp to txt because forum wouldn’t let me upload .cpp). Just replace the file in there with this one, change extension, and compile it. Run it on jetpack 5, and change the #if from 1 to 0 for jetpack 4.
video_decode_main.txt (3.4 KB)

Hi,

Thanks.
Will provide more info to you later.

Hi,

Please try to maximize the VIC clock:
https://docs.nvidia.com/vpi/algo_performance.html#maxout_clocks

Here are our results with your application on JetPack 5:

Before maximizing the clock

$ ./video_decode
Unwrapped DMA Copy takes 316 ms, average 3 ms per conversion
App run was successful

After maximizing the clock

Unwrapped DMA Copy takes 95 ms, average 0 ms per conversion
App run was successful

Thanks.

In my attempt at pinning the VIC to max (601600000) the execution time goes down to ~155ms, no where near 95. The following are the commands I used:

echo on > /sys/devices/platform/13e10000.host1x/15340000.vic/power/control
echo userspace > /sys/devices/platform/13e10000.host1x/15340000.vic/devfreq/15340000.vic/governor
echo 601600000 > /sys/devices/platform/13e10000.host1x/15340000.vic/devfreq/15340000.vic/max_freq
echo 601600000 > /sys/devices/platform/13e10000.host1x/15340000.vic/devfreq/15340000.vic/userspace/set_freq

Didn’t realise that link included a full script for maximising performance. Even after running it though still only getting ~145-150ms.

Hi,

Sorry that I was tested on an Xavier device so the performance is better.
Does the performance meet your requirements?

Thanks.

What? I stated before my case was on Xavier NX, are you saying you did not use a Xavier NX?
And if you refer to my original post, the question is about the performance degredation from Jetpack 4 to 5. I’m not happy until it’s equal or better.
An explanation from one of your engineers as to what I’ve been calling the VIC ramping problem would also be appreciated. If you run this test twice in a row without a pinned VIC frequency you may find that the execution time is reduced by up to 50% (from 360 down to 260). This never happened on Jetpack 4, it always ran the same speed and as fast as we needed it to.

Hi,

We need to check with our internal team to gather more info about the issue.
Will update here later.

Thanks.

Any update on this?

Hi,

Our internal team is still checking this.
Will provide more info once we get a response.

Thanks.

Hi,

There is in general perf drop from kernel 4.9 to 5.10 due to security hardening.

Thanks.

Can you go into details about what security hardening changes have caused this?

Hi,

These are from the upstream kernel.

Thanks.

I ask for specifics as I want to know if these hardening changes can be disabled through boot parameters or something similar, hence why I ask for specifics if your engineers know that it was sec changes that caused it.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.