sorting best sorting algoritm for GT470

Which is the fastest sorting algorithm for GPUs? Please mention the link for the code, if it is available.
I have written an algorithm for parallel sorting. It takes 0.274 seconds to sort 131072 32 bit integer elements excluding memory transfer time. Is the performance comparable with other parallel sorting algorithms? Eagerly waiting for your comments.
You have approximately 3 orders of magnitude to go until your sort would be comparable to the state of the art implementations :)