Hi,

Im new to GPGPU programming and thinking to use it in order to simulate a model that requires running a BFS from each node of the graph during each step to of the simulation to determine the shortest path from each node to a set of destinations (other nodes).

The current model is written in C++ using the std:vector lib - the entire adjacency list is a vector<vector> and more importantly, there’s a vector<vector> that holds information about the path participation of each node, meaning :

if node **i** is on the path of node **j** to one of it’s destination, **j** would be added to the **i**-th vector in the vector of vectors. the reason is because the numbers of participated node-paths isn’t constant.

I was thinking of speeding up the simulation time (which is doing a BFS from each node on each iteration) by implementing it with Cuda by using a kernel function where each BFS runs in a different thread and updates the participation vector accordingly (Im **not **trying to run the same BFS using parallelism, Im trying to run many BFSs each BFS done on a different GPU thread).

As much as I know **global** functions cannot handle 2D vectors, and I wish to avoid writing the entire model using 2D arrays (which would require fixed sizes and will also force enlarging the participation container into a **size**X**size** matrix that would need to be copied from the device to the host in each step).

So I wanted to ask if there is a time efficient way to make such a conversion (from a vector of vectors graph to something a device function can work with) on each step ? or if there is no better solution than to convert the entire model to use arrays ?

Thank you very much.