Timing performance of vpiSubmitOpticalFlowPyrLK() with different backend flags

Hi,

I use the attached function to find the corresponding points between two frames on Orin.
tmp.txt (5.1 KB)

When monitoring the used time during calling NvVpiPyrLKOpticalFlow::sparse(), I found the difference between VPI_BACKEND_CPU and VPI_BACKEND_CUDA is small in both the following two scenarios. Is this normal? I also found that there are a few milliseconds difference by setting flags=0 and flags = vpiBackend_, its hard to understand also. The third thing confusing me is that I still saw some GPU usage activities when
backend flag is VPI_BACKEND_CPU via jtop.

How to explain these behaviors?

scenario 1:
line 264 const uint64_t flags = vpiBackend_;
backendFlag=0 [VPI_BACKEND_CPU], w=1920, h=1080
frameCnt=200, perFrameTimeUsage(mu=24020.3, std=4153.9)(macro sec)
backendFlag=1 [VPI_BACKEND_CUDA], w=1920, h=1080
frameCnt=200, perFrameTimeUsage(mu=21908.3, std=4589.6)(macro sec)

scenario 2:
line 264 const uint64_t flags = 0;
backendFlag=0 [VPI_BACKEND_CPU], w=1920, h=1080
frameCnt=200, perFrameTimeUsage(mu=30842.4, std=3925.4)(macro sec)

backendFlag=1 [VPI_BACKEND_CUDA], w=1920, h=1080
frameCnt=200, perFrameTimeUsage(mu=24420.1, std=4000.2)(macro sec)

Hi,

When you test the performance, have you maximized the device clock?
https://docs.nvidia.com/vpi/algo_performance.html#maxout_clocks

We do list the performance of the LK tracker here:
For a 1920x1080 image, CPU takes 0.412±0.002 ms and CUDA takes 0.0586±0.0002 ms.
https://docs.nvidia.com/vpi/algo_optflow_lk.html

Tracking depends on the number of detected objects.
So it’s possible to vary for different scenarios.

Thanks.

Thank you, when the clock is maximized, I do see a factor of 2 improvement. But it’s not clear where are the start and end points to measure the elapsed time in the performance table mentioned in the above link.

Hi,

The benchmark is tested similarly to our 05_benchmark example.
It contains the warm-up and benchmark algorithm in a batch manner.

https://docs.nvidia.com/vpi/sample_benchmark.html

For more details, please find the “Benchmarking Method” info below:
https://docs.nvidia.com/vpi/algo_performance.html#benchmark

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.