Fast parallel sort algorithm?

In my application sorting a vector of this struct takes with std::sort (by weight) about 30 sec with 50000 items and is a major bottle neck.

struct Weight_Index {
		double weight;
		size_t index;
struct by_weight_incr {
		bool operator()(Weight_Index const &left, Weight_Index const &right) {
			return left.weight < right.weight;

Can anyone suggest a gpu-algorithm that does the job (much) faster? Preferably with some code example:-)

Both CUB and Thrust provide sorting algorithms. You can try these.

I suspect your choice of cointainer may be the bottleneck here.

50000 16 byte elements like yours should be piece of cake to sort for a CPU.

Try sorting these elements in a std::vector<>, and also try a std::list<> for comparison when benchmarking the CPU based sort.

were you using std::sort() or qsort() ?


The container to be sorted is a vector<Weight_Index> and I am using std::sort.
I have every little experience with C++ and do not know what to expect.