I’m running a very simple ORB feature detection (detect & compare) of 2 images, followed by matching.
My code is very similar to what’s described in:
https://devtalk.nvidia.com/default/topic/1035448/cuda-programming-and-performance/surf-with-cuda-is-not-faster-by-a-noticeable-amount/post/5260064/#5260064
but using ORB instead of SIFT.
My system has OpenCV 4.1.2 compiled for CUDA with ARCH_BIN 7.2 (Release).
I’m using 4K images for my measurements.
My Jetson Xavier runs @15W setting (4 cores @1190Mhz, GPU @ 318-675Mhz). Detecting the features on a single image ~310ms.
When I run the test on PC with Nvidia QuadroM2000M on the same images, it takes 70ms on a single image.
QuadroM2000M has 768 CUDA cores while Xavier has only 512 CUDA cores. Also, QuadroM2000M is clocked at 1100Mhz while Xavier GPU is clocked at 675Mhz max. I’m not sure these differences can account for the entire performance gap between QuadroM2000M and Xavier.
Is it possible to get better performance for ORB on Xavier?