I want to implement bfs in CUDA. I found a promising paper :“An Effective GPU Implementation of Breadth-First Search”.

Does anyone know where to get pseudocode or an implementiation example?

I already used the frontier Algorithm by P. Harish and P. J. Narayanan but it is slower than a fast CPU Implementation on graphs with big diameter. The algorithm is not work efficient.

I also found the paper: “Accelerating CUDA Graph Algorithms at Maximum Warp”. Did anyone implement the bfs Algorithm described in it?

Do you know more papers about efficient bfs CUDA implementations? Do you know which performs the best?

kind regards.