Hey there.
I am trying to implement reverse search algorithm for vertex enumeration on GPU. Now I already hav an MPI code working for the same. The problem is that if one looks at the general structure of this algorithm, it is like a tree. In a nutshell-you start at the root node. do some math to find its child nodes. then from these child nodes get more children and so on. The MPI code we did is a master and slave style implementation(With master doing the control and slaves travelling the tree and producing output and sending it to master). All this stuff is not so SIMD in my opinion.
I came across literature by Dr. Hwu of UIUC regarding graph searches on GPU. Then again it has a lot to do with databases i guess.
My question is does anyone know if someone has done such adaptation of not-so-SIMD code to GPU?
Thanks in advance. I apologize for breaking any posting rules in advance as this is my first post.

Theoretically, all problems can be SIMDized. Only data size scales up, but the number of codes is finite.

To answer your question, I know that in ray tracing, there was work showing that SIMD compaction of rays give a good (4x if I remember correctly) speedup over just naively recursively tracing adjacent rays. I think the same technique can be applied to breadth first search, though with less benefit since breadth first search has fewer program states than a ray tracer. Basically, sort the connected vertices of the frontier nodes into a discovered buffer or an undiscovered buffer. Then process those 2 buffers separately to minimize SIMD divergence.