Image processing - feature matching

Hi all,

I have a pair of images and a series of features identified in both (along with descriptors).
I want to do feature matching by comparing their descriptors (by brute force).
Is there an obvious way how to parallelize this problem in CUDA?

My initial thought was to split the images down into smaller regions along with padding around each of the regions to correspond to the search radius. Then for each of the features in that region search all the features that fall into that region and corresponding padding in the other image. However, this involves loading a lot of feature descriptors into shared memory which seems like it could kill any potential advantage…?

Any thoughts would be greatly appreciated.