The best and most recent cuda BFS graph traversal implementation

Hello everyone,
I’m looking for best implementations of BFS for CUDA. I studied some papers which are referenced in the internet for example:

but those implementations seem unclear/naive to me. Are there any new developments on this topic? . Is there any better implementation ?
Thanks in advance.