I am looking to improve the performance of my Circle Hough Transform algorithm and therefore I have resorted to using CUDA. I’m currently using atomic operations in global memory to carry out the voting which I know can be slow. However, I am unsure how to store the votes and the coordinate locations of those votes in shared memory so that I can copy the results back to global memory.
Has anyone got any suggestions/advice on how best to do this in order to achieve maximum performance for the CHT?
Many thanks in advance everyone for your time.