Opencv SURF with CUDA is not faster by a noticeable amount on agx orin

On Agx Orin 32GB, we build opencv 4.10 use the following script: https://github.com/AastaNV/JEP/blob/master/script/install_opencv4.6.0_Jetson.sh

the script is recommended in post: [How do I install openCV with CUDA support? - Jetson & Embedded Systems / Jetson Orin Nano - NVIDIA Developer Forums](How do I install openCV with CUDA support?

we notice there is a imilar discussion in post: Jetson Nano OpenCV OpenCL/CUDA SURF - Jetson & Embedded Systems / Jetson Nano - NVIDIA Developer Forums, it says the experiment is tested on the Jetson Xavier. For a 800x800 image, it takes 0.003724s to find 513 key-points.

we use the same code in SURF on CUDA: every execution produces different + weird results for some images? - OpenCV Q&A Forum,but for a for a 800x800 image, it takes ~100ms to find 513 key-points.
Compared to the Jetson Xavier, the Jetson agx orin execution speed is too slow for cuda surf detector, please help me point out the problem, Thx!

when do the experiment,we turn off all other programs that use the GPU.

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

3. Tutorial

Startup deep learning tutorial:

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

I exec the following cmds to set agx orin to maximum performance mode
$ sudo nvpmodel -m 0
$ sudo jetson_clocks

now the cuda surf detector process a 800x800 image, it takes ~20ms to find 513 key-points.
the agx orin 32GB is about 5 times slower than Jetson Xavier。

Hi,

Could you check the GPU utilization?

$ sudo tegrastats

Thanks.

Above are command execution results and other information for you。
I exec cuda surf keypoints detect in a for loop。

Hi,

Orin’s GPU utilization is pretty high so it should already be well-optimized.

A possible reason is that the algorithm might change in the newer OpenCV version.
Have you also checked with the OpenCV team?

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.