radix sorting on 2 GPU's

hello all…
I am looking for radix sort implementation in CUDA on 2 GPUs. If anyone has such implementation or any links that could be helpful please reply.