Good afternoon. I am trying to implement SIFT (https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.html) detector with CUDA. I am aware that there is an open-source implementation on Github, I am doing this for learning purposes.
My code is running, but I am having doubt about some choices I have made for the implementation, and I would be grateful if somebody more advanced in CUDA would give some a practical advice (understanding how SIFT works would probably be required to answer those questions). I am very thankful in advance :)
-
I am not using textures in CUDA. I have read about them, but would it be really beneficial for this application? I guess I could make an input image as a read-only texture. But the other arrays (i.e. scale-space pyramid of SIFT) would be much bigger than the input image, while not being read-only. So would making only the input image as read-only help performance? Should I “transform” my arrays to textures once they are filled and will stay read-only for the rest of the program?
-
I am following an article for threads allocation - https://www.researchgate.net/publication/269302930_Parallelization_and_Optimization_of_SIFT_on_GPU_Using_CUDA. Does it look good (looking at the images labelled as “threads allocating” is enough to answer this question, no need to read the whole article).
-
That article does not say anything about using scale-space octaves, so it just allocates threads for a single level in a single octave at the time. What would be the better way of adding octaves:
a) Every level is in a separate thread?
b) The whole octave is processed in one thread & kernel is called sequentially for each octave? (This is how I have it now.)
c) All the scale-space is processed in the same thread?
d) Some other option I cannot come up with?
Thank you for reading.