My own comparer for thrust::sort is very slow.

Hello,

I’d like to create my own comparer for thrust::sort, but it slows down drammatically!

I created my own less implemetation by just copying code from functional.h.

However it is compiled in some another way and works very slow.

[list=1]

[*]default comparer: thrust::less() - 94ms

[*]my own comparer: less() - 906ms

I’m using Visual Studio 2010. What should I do to get the same performance as at option 1?

Complete code:

#include <stdio.h>

#include <cuda.h>

#include <thrust/host_vector.h>

#include <thrust/device_vector.h>

#include <thrust/generate.h>

#include <thrust/sort.h>

int myRand()

{

	static int counter = 0;

	if ( counter++ % 10000 == 0 )

		srand(time(NULL)+counter);

	return (rand()<<16) | rand();

}

template<typename T>

struct less : public thrust::binary_function<T,T,bool>

{

  __host__ __device__ bool operator()(const T &lhs, const T &rhs) const {return lhs < rhs;}

}; 

int main()

{

    thrust::host_vector<int> h_vec(10 * 1000 * 1000);

    thrust::generate(h_vec.begin(), h_vec.end(), myRand);

thrust::device_vector<int> d_vec = h_vec;

int clc = clock();

    thrust::sort(d_vec.begin(), d_vec.end(), less<int>());

    printf("%dms\n", (clock()-clc) * 1000 / CLOCKS_PER_SEC);

return 0;

}

The thrust sort backend has two implementations: a fast radix sort and a much slower but more general comparison sort.

Deep in the dispatch files there’s a line that compares the passed-in comparison predicate to the thrust-provided less<> template:
static const bool use_radix_sort = thrust::detail::is_arithmetic::value &&
(thrust::detail::is_same<StrictWeakOrdering, typename thrust::less >::value ||
thrust::detail::is_same<StrictWeakOrdering, typename thrust::greater >::value);

As you can see, it uses the radix sort implementation if the key is arithmetic (i.e. a built-in numeric type) and if thrust::less<> or thrust::greater<> is used. Although your predicate is identical in implementation to thrust::less<>, it’s not the same type, so the thrust dispatch mechanism is using the slower comparison sort.

sean

Great!

Sean, is it possible to force radix algorithm to sort structures by integer key? I have rows as follows

struct row

{

   int val1;

   int val2;

   int val3;

   int rowid;

};

And I need to sort them by val1-val4 and sometimes by combination of vals. Is it possible to force Thrust using radix algorithm for that?

It’s easiest and fastest if you just supply a single 32-bit unsigned integer key. Try to invoke the sort that way.

If you want to sort by val1-val4… so you have basically a 128-bit key (!), you should do 4 32-bit key/index sorts. Because radix sort is stable, sort the least significant key, then the next least significant key, etc. At the end, gather your data based on the indices that were lugged around.