You need to create a custom compare function that operates on the keys in your structure. Something like this:
template<typename T>
struct dev_cmp_custom_key : public binary_function<T,T,bool>
{
/*! Function call operator. The return value is <tt>lhs < rhs</tt>.
*/
__host__ __device__ bool operator()(const T &lhs, const T &rhs) const
{
uint a=lhs.myKey;
uint b=rhs.myKey;
return (a < b);
}
}; // end compare
And then call the sort function like this (where dArray is your array of structs) :
Jeroen already answered your question, but I wanted to point out that your performance will be much better if you separate the array of structures (AoS) into a structure of arrays (SoA) and then use thrust::sort_by_key() to perform a key-value sort.
In most cases, several memory transactions are required to read a struct from global memory into registers (or shared memory). In contrast, when you access an array of primitive types (float, int, etc.) the memory accesses are coalesced and much more efficient. Note that the built-in vector types like float2 and uint4 have some special specifiers (i.e. align) which allows them to be coalesced, but they are somewhat exceptional.
Converting AoS to SoA will often net you a sizable speedup (e.g. 2x or 3x) so I would strongly urge you to do it before considering any other optimizations. I wish I had some benchmarks on hand to illustrate the point, but IMO AoS really is CUDA public enemy #1.
Note that you can still use AoS-style programming with SoA-style data layout with thrust::zip_iterator. The zip_iterator “zips” values from several arrays together into tuples of values. Theres a brief discription of zip_iterator in the Introductory slides and online Tutorial. A complete example is also available.
[codebox]Error 2 error: no instance of overloaded function “dereference” matches the argument list c:\to52\gpuculler\ext\thrust\thrust\sorting\detail\device\cuda\stable_merge_sort.inl 325
[/codebox]
Then I tried using directly the raw device pointer (memory is okay, since I used the same raw pointer for another kernel, and then copied the data back to host for checking… everything’s ok). it compiles fine. But I get “Unhandled exception at 0x10007467 (gpuCuller-d.dll) in LibTest.exe: 0xC0000005: Access violation reading location 0x01000718.”
I kept using AoS scheme for now, but I’ll need to switch to AoS sooner or later. The sooner the better i guess, but i just wanted to try sorting and get some results first… but :P