Can we implement Depth First Search on GPU ?

Hi,
I read that DFS(Depth First Search) proved to be very difficult algorithm to design for use with CUDA. DFS is inherently sequential process and the researchers believe has no parallel solution. As a result, it was decided that an implementation of DFS using CUDA, that closely followed the sequential implementation, would be created.

I know BFS was successfully implemented using CUDA, but found nothing about DFS till now. Is DFS implemented using CUDA before ? Any thoughts ?

Most CUDA raytracing code performs a depth-first search in tree (bounding volume hierarchy usually), but rather than parallelize the search for a single ray, they run many rays simultaneously.

Another option would be to begin the search several layers deep in the tree, where there are as many intermediate nodes as parallel threads. Then each thread starts its own depth-first search from a different node.