I am currently building a feature matching module. The hardware (AGX) should be used as efficiently as possible. Basically the pipeline consists of feature detection, description and matching.
The ORB implementation in OpenCV seems to be widely used. But its resource usage is quite low / inefficient.
The Harris corner detector from VPI looks like a good way to offload some compute from the GPU to the PVA. Is there any keypoint descriptor algorithm (like BRIEF from ORB) that is optimized for jetson devices? The only official descriptor implementation from Nvidia I came across is the HoG texture descriptor from NPP, which does not fit our application (we are looking for keypoint descriptors for sparse matching, not for texture descriptors).
I would appreciate any hints for where I could find some optimized implementations :)
I am not looking for deep learning inference per se. There are some models that calculate keypoint descriptors, however I want to keep the DLAs free for other models. I am currently using the VPI Harris corner detector to compute key point locations on the PVA without using GPU/CPU/DLA. I need descriptors for these keypoints in order to match them between images. I was wondering if there are any keypoint descriptor routines available from NVIDIA, since I could not find any optimized routines. Nvidia has a plethora of libraries, so I thought I might simply be missing something.
Yes, that is exactly what I am already using. The Harris corner algo yields keypoint locations, while I was wondering if there is any implementation for keypoint descriptors.
Keypoint locations = x/y coordinates for interesting points
Keypoint description = Set of values per point, enabling comparison of how similar points are.
So a classical CV pipeline (e.g. image stitching) is:
Get Image pair
Find keypoints in each image (e.g. via the VPI API)
Calculate descriptors for each key point (This is what I am looking for)
Match descriptors of keypoints between images
Calculate homography from matches (RANSAC etc.)
Step 2 and 3 are the bottlenecks of such pipelines. VPI provides a fast implementation of step 2, even implemented for an accelerator, which is great to have :) But step 3 will still consume a huge amount of time, so I was wondering if anyone is aware of an implementation of a keypoint descriptor, ideally by Nvidia.
Of course I could write my own/use some open source stuff. But that is probably as efficient as writing my own FFT or BLAS routines instead of using the ones optimized by the vendor (i.e. cuFFT/cuBLAS).
I know that this is a very open question and not directly related to the Jetson line. On the other side I am probably not the only one struggling to find resources on this, so I posted here :)