Im new to GPGPU programming and thinking to use it in order to simulate a model that requires running a BFS from each node of the graph during each step to of the simulation to determine the shortest path from each node to a set of destinations (other nodes).
The current model is written in C++ using the std:vector lib - the entire adjacency list is a vector<vector> and more importantly, there’s a vector<vector> that holds information about the path participation of each node, meaning :
if node i is on the path of node j to one of it’s destination, j would be added to the i-th vector in the vector of vectors. the reason is because the numbers of participated node-paths isn’t constant.
I was thinking of speeding up the simulation time (which is doing a BFS from each node on each iteration) by implementing it with Cuda by using a kernel function where each BFS runs in a different thread and updates the participation vector accordingly (Im not trying to run the same BFS using parallelism, Im trying to run many BFSs each BFS done on a different GPU thread).
As much as I know global functions cannot handle 2D vectors, and I wish to avoid writing the entire model using 2D arrays (which would require fixed sizes and will also force enlarging the participation container into a sizeXsize matrix that would need to be copied from the device to the host in each step).
So I wanted to ask if there is a time efficient way to make such a conversion (from a vector of vectors graph to something a device function can work with) on each step ? or if there is no better solution than to convert the entire model to use arrays ?
Thank you very much.