hello,
I have a graph matching algorithm (code already made)which currently runs sequentially (parse, score, search are the 3 main stages). Each of these stages need to be paralellized. The problems im facing is:
-
There is an extensive use of STL containers like vectors, maps, lists etc. and CUDA originally does not support them. So i came across this template library called Thrust. Ive not started as yet, so can somebody please tell me if this is the path i should take.
-
This ones the main part. All graph objects are referred to by Boost shared pointers. All functions never operate using the actual data structures, instead they operate using the smart pointers. How do I allocate memory on the device with this kind of code design. Do i have to move the actual data structures to the device or i need to manipulate the pointers???
Any advice on this will be realllllly helpfull…
Thank you.