Deepstream nvof performance

• Hardware Platform (Jetson / GPU) Jetson AGX Orin, Orin NX, Xavier NX
• DeepStream Version 6.3
• JetPack Version (valid for Jetson only) 5.1.2GA
• Issue Type( questions, new requirements, bugs) questions

Hi, I am currently using the AGX Orin 64G, and I have some performance issues with the deepstream6.3 nvof. All performance tests were run in MAX mode, with the test video being 720p.

my testing pipeline:
uridecodebin->streammux->nvof->nvdslogger->fakesink

1.I want to first confirm whether the nvof calculations on Xavier NX and Orin NX are both using VIC hardware for computation?

2.In the performance test of the nvof plugin:
Xavier NX can achieve 387 FPS.
Orin NX can only reach around 100 FPS.
Even when increasing Orin NX VIC frequency to 729.6 MHz, it only reaches 218 FPS.
Orin NX should be stronger in computation compared to Xavier NX, what could be the reason for this?

3.Testing AGX Orin with the nvof plugin:
Through jtop, I confirmed that OFA is operating, but it only achieves 120 FPS.
After increasing VIC frequency, it can reach 240 FPS.
Why does the performance increase after raising the VIC frequency? Isn’t nvof on AGX Orin supposed to be calculated by OFA hardware?

4.In the VPI Dense Optical Flow algorithm, AGX Orin takes about 1.44 ms for 1080p low quality and grid size 4, which is approximately ~700 FPS. Why is there such a big difference compared to the nvof plugin’s 720p 240 FPS?

https://docs.nvidia.com/vpi/algo_optflow_dense.html#algo_optflow_dense_perf

you can use jtop or “sudo tegrastats” to check if VIC is using. where did you see nvof will use VIC? did you see any official doc? please refer to the nvof doc.
4. here are some analysis.
a. the test source is different. you need to use the same source to compare.
b. did you monitor the decoder utilization? maybe it will affect the whole performance.
c. the test methods are different.

Here is the source:

a. the test source is different. you need to use the same source to compare.
Does it means that nvof plugin doesn’t implement by Dense optical flow algorithm?
Can you provide the performance table of nvof plugin in AGX Orin and Orin NX device?

b. did you monitor the decoder utilization? maybe it will affect the whole performance.
The decoder can decode up to 22 streams in 1080p, so I don’t think decoder is the bottleneck.

I would like to know the most important thing is whether the nvof plugin can use the OFA for calculations?

I mean, you are testing with 720p source while nvidia-jetson-agx-orin-technical-brief.pdf is using 1920x1080.

VIC is used for color format conversion acceleration. OFA is used for OF computation acceleration.

yes, nvof plugin will use OFA for calculations, you can use jtop to check.

Currently there is no public performance table.

Yes, you are right. So, doc presents that decoder can decode 660FPS(22streams x 30 FPS) in 1920x1080 resolution, it can decode more than 660FPS in 1280x720 (theoretically can reach ~1000FPS).
And, AGX Orin nvof performance is 120FPS in 1280x720, after increasing VIC frequency reach 240 FPS.
Therefore, I think decoder isn’t the bottleneck.

Thanks for the clarification.

Let me organize the current issues.

  1. I want to first confirm whether the nvof calculations on Xavier NX and Orin NX are both using VIC hardware for computation?
    NO, VIC don’t calculate nvof’s algorithm

So, which hardware is used to compute the nvof algorithm on Orin NX and Xavier NX? After all, only the AGX Orin has the OFA hardware.

  1. In the performance test of the nvof plugin:
    Xavier NX can achieve 387 FPS.
    Orin NX can only reach around 100 FPS.
    Even when increasing Orin NX VIC frequency to 729.6 MHz, it only reaches 218 FPS.
    Orin NX should be stronger in computation compared to Xavier NX, what could be the reason for this?

If the calculations aren’t done using VIC hardware, why does increasing the VIC frequency improve performance?
And, I also would like to know why Xavier NX show better performance than Orin NX in same test case?

  1. Testing AGX Orin with the nvof plugin:
    Through jtop, I confirmed that OFA is operating, but it only achieves 120 FPS.
    After increasing VIC frequency, it can reach 240 FPS.
    Why does the performance increase after raising the VIC frequency? Isn’t nvof on AGX Orin supposed to be calculated by OFA hardware?
    Yes,AGX Orin is using OFA hardware

Same raise VIC freq, but improve nvof calculation performance question.

  1. In the VPI Dense Optical Flow algorithm, AGX Orin takes about 1.44 ms for 1080p low quality and grid size 4, which is approximately ~700 FPS. Why is there such a big difference compared to the nvof plugin’s 720p 240 FPS?

here are some analysis.
a. the test source is different. you need to use the same source to compare.
b. did you monitor the decoder utilization? maybe it will affect the whole performance.
c. the test methods are different.

Can nvidia team test nvof plugin on AGX Orin in same case?

if want to compare the test data(fps), please use the same source to rule out the decoder’s effect.

VIC and OFA are different hardware. as I said in my last comment, VIC is used for color format conversion acceleration. OFA is used for OF computation acceleration. the whole application will use color format conversion and OF computation at the same time.

Thanks for the sharing! could you share the whole gst-launch pipeline? Thanks!

OK, I tested the nvof plugin performance in 1920x1080, it’s only 34FPS.

VPI Dense Optical Flow algorithm, AGX Orin takes about 1.44 ms for 1080p low quality and grid size 4, which is approximately ~700 FPS.

Why is there such a significant difference in performance?
my pipeline:

gst-launch-1.0 uridecodebin uri=file:///opt/nvidia/deepstream/deepstream-6.3/samples/streams/sample_1080p_h264.mp4 ! queue ! mux.sink_0 nvstreammux name=mux batch-size=1 width=1920 height=1080 ! queue ! nvof ! nvdslogger fps-measurement-interval-sec=1 ! fakesink

I know VIC and OFA are two totally different hardware from beginning.
I don’t know why you keep talking about color format conversion. Am i missing something about nvof?
I just want to know which hardware Orin NX and Xavier NX use to calculate nvof algorithm? Since these two device don’t have OFA hardware.

Sure!

gst-launch-1.0 uridecodebin uri=file:///opt/nvidia/deepstream/deepstream-6.3/samples/streams/sample_720p.h264 ! queue ! mux.sink_0 nvstreammux name=mux batch-size=1 width=1280 height=720 ! queue ! nvof ! nvdslogger fps-measurement-interval-sec=1 ! fakesink

my test result:

  1. AGX Orin:105FPS & 240FPS(Set VIC freq 729.6 MHz)
  2. Orin NX:95FPS & 224FPS(Set VIC freq 729.6 MHz)
  3. Xavier NX:387FPS

As the table shown, there are other parameters quality and gridSize. please set these parameters for nvof when testing.

As I said in my last comment, the whole application will do color format conversion and OF computation at the same time.

About OF performance comparing on Xaver and Orin, please refer to this topic optical-flow-is-slow-on-jetson.

Yes, I knew that. All my testing parameters were default setting.
Nvof plugin can only set gridSize as 4x4, and quality default setting is fast mode(low quality).
Therefore, I compared these testing result on the same baseline.

I will divide the tests into two parts: one part will test the impact of VIC on the pipeline, and the other part will test the impact of VIC on nvof.

Test Device: AGX Orin 64G
Pipeline without nvof:

gst-launch-1.0 uridecodebin uri=file:///opt/nvidia/deepstream/deepstream-6.3/samples/streams/sample_1080p_h264.mp4 ! queue ! mux.sink_0 nvstreammux name=mux batch-size=1 width=1920 height=1080 ! queue ! nvdslogger fps-measurement-interval-sec=1 ! fakesink

VIC freq 115MHz: 384FPS
VIC freq 729.6MHz: 388FPS

Performance is slightly increase while raising VIC freq.

Pipeline with nvof:

gst-launch-1.0 uridecodebin uri=file:///opt/nvidia/deepstream/deepstream-6.3/samples/streams/sample_1080p_h264.mp4 ! queue ! mux.sink_0 nvstreammux name=mux batch-size=1 width=1920 height=1080 ! queue ! nvof ! nvdslogger fps-measurement-interval-sec=1 ! fakesink

VIC freq 115MHz: 35FPS
VIC freq 729.6MHz: 65FPS

Through the tests mentioned above, it’s clear that VIC has the most significant impact on the nvof plugin.
However, both nvof input and output are same NV12 color format, no need to do color format conversion. Why is VIC involved in the calculation?

My AGX Orin results(65FPS) are far below the results in the table.

Hi,
For having optimal throughput of VIC engine, please run the script to fix clock at maximum:

VPI - Vision Programming Interface: Performance Benchmark

Hi,
All my testing mentioned above had already done that. (I used these commands to raise VIC freq)

Hi,
VIC is at maximum frequency 729.6MHz, so this is the optimal throughput on AGX Orin. There is no room for further enhancement.

Have you even looked at my questions? I didn’t ask how to increase the frequency beyond 729.6 MHz.

Hi,
Since nvof plugin is not public, I may not be able to share further information. My apology for this.

Thanks for your reply.