Is there external memory sort library similar to thrust?

I want to sort a large dataset using CUDA on GPUs, like 40GB. It is important part of my application.

I have used thrust sort library for a small dataset, it is fast and easy-use.

Is there a sort library similar to thrust to support a large dataset that can’t fit in the global memory? I’ve searched some papers researching on external memory sort, it would take me too much time to implement it. If there is a good library I can use, I can focus on my application problem.

Thanks a lot~~~~