Looking for libraries that use warp-synchronous programming

Hi,

I am looking for CUDA libraries implemented as device functions that happen to use warp-synchronous programming in some way. Anybody has any pointers?

Thank you,

Rodrigo