Sorting float2 and float arrays

Groove · September 4, 2009, 7:43am

Hello

I’m porting an evolutionary optimization from the CPU to GPU. Everything works fine and very fast so far. But now I’ve a problem when sorting the result arrays.

I’ve 2 arrays. The first array is float2 and has 130.000 Elements, the second one is float and has 10.000 elements

float2 *a // 130K elements

float *b  //	10K elements

Each element of array b belongs to 13 elements of array a.

(Array a contains 13 points of a Bezier based geometry, array b contains the calculatet area covert by this geometry - and I’ve to sort the bezier array by the best area values)

a[0]   - a[12] -> b[0]

a[13] - a[25] -> b[1]

......

Now I’ve to sort both according to the value size of array b

b[0] = 4000; 

a[0].x  = 4.7;

...

a[12].x = 7.8;

b[1] = 3000;

a[13].x  = 10.5;

...

a[25].x = 11.6;

b[2] = 5000;

a[26].x  = 15.3;

...

a[38].x = 16.9;

// ---- sortet

b[0] = 5000; 

a[0].x  = 15.3;

...

a[12].x = 16.9;

b[1] = 4000;

a[13].x  = 4.7;

...

a[25].x = 7.8;

b[2] = 3000;

a[26].x  = 10.5;

...

a[38].x = 11.6;

I hope u understand what I mean.

I looked at the examples of radix and bitonic sort…I also checked gpuqsort. But all algorithmes only sort one list with standard datatypes by value.

My idea was to sort the float array and only save the sortet indecies into an new list. But my method (by reduction) only does 1024 elements. And 130K or at least 10K elements are far to much for shared memory.

Any further ideas?

It would also be possible to save both list in to one float3 array, if this would help.

Thx!

SPWorley · September 4, 2009, 8:01am

Quick and dirty, not the most efficient, but easy.

Use the fact that the radix sort is a stable sort. So two (or more) keys with the same value will keep their relative order.
So attach the key value b to EACH of your 13 a values. Then sort the whole mess.
Your values will now be in order, with the 13 values still sequentially enumerated. So this is a sort of 130,000 key-data pairs, even though your true problem is only a 10K key sort with a fat data payload.

It’s not as efficient as it could be because you’re using none of your special knowlege of the one key / 13 data setup, but it’d work and would be easy.
If the performance isn’t good enough, then you start thinking about customizing more.

Sorting this little data on the GPU isn’t as efficient as it is on the CPU, though. The GPU starts to do best at 1M+.

Groove · September 4, 2009, 8:30am

Quick and dirty, not the most efficient, but easy.

Use the fact that the radix sort is a stable sort. So two (or more) keys with the same value will keep their relative order.

So attach the key value b to EACH of your 13 a values. Then sort the whole mess.

Your values will now be in order, with the 13 values still sequentially enumerated. So this is a sort of 130,000 key-data pairs, even though your true problem is only a 10K key sort with a fat data payload.

It’s not as efficient as it could be because you’re using none of your special knowlege of the one key / 13 data setup, but it’d work and would be easy.

If the performance isn’t good enough, then you start thinking about customizing more.

Sorting this little data on the GPU isn’t as efficient as it is on the CPU, though. The GPU starts to do best at 1M+.

Thx for ur fast response.

I’m not sure how to use the keyvalues, but I’ll try it. It mustn’t be that ultra fast, till now the gpu ist about 1.700 times faster than a Phenom x4, some performance loose won’t harm that much. The main thing is, that it should be faster than converting to cpu, sort there and load it back to the GPU.

I will report here, if it works.

Thx!!

Groove · September 4, 2009, 1:58pm

Ok, after almost 6 hours, I’ve to admit, that porting the radixsort sample of the sdk is to hard for me. Wheter I understand was is happening nor do I get what is important for my use. And copying the whole thing and just replace the given arrays with mine, doesn’t function either. So I tried CUDPPSORT. But as soon as I use CUDPPResult, all my cuda float2 arrays can’t be initialised anymore?! I can’t frigure out what exactly the problem is. But does anyone knows, if cudppsort can take float2 arrays? The documentation is very poor.

Is there any other way to get a simple radixsort, like kind of just one function to call?

Thx again.

JaredHoberock · September 4, 2009, 2:09pm

Use thrust::sort_by_key.

Groove · September 6, 2009, 9:29am

Thx.

Is there a way to get an already allocated and filled device array into thrust::device_vector? All examlples I found so far copy host vectors to the device.

Groove · September 6, 2009, 11:25am

Ah ok, I got it.

You’ve to use:

#include <thrust/sort.h>

float2 *a;

float *b;

cudaMalloc( (void**) &a, sizeof(float2)*numElements);

cudaMalloc( (void**) &b, sizeof(float)*numElements);

thrust::device_ptr<float2> thrust_a_pointer(a);

thrust::device_ptr<float> thrust_b_pointer(b);

.....

thrust::sorting::radix_sort_by_key( thrust_b_pointer, thrust_b_pointer+numElements, thrust_a_pointer);

It’s not that fast, but I does what i need. Thx very much for the hint!

Topic		Replies	Views
Sort on GPU Need some help to use sorts... CUDA Programming and Performance	15	42108	June 19, 2008
sorting 2d array in CUDA CUDA Programming and Performance	5	14109	February 15, 2011
Help with sorting CUDA Programming and Performance	7	2007	June 11, 2010
Question about sorting using CUDA CUDA Programming and Performance	1	3690	June 9, 2010
Thrust, Radix Sort and uint2 Key-Value Pairs How to sort uint2 pairs with thrust? CUDA Programming and Performance	5	2785	January 30, 2012
GPU sort CUDA Programming and Performance	6	3056	September 5, 2008
Sorting in CUDA Is sorting in CUDA worth the trouble? CUDA Programming and Performance	15	8365	September 30, 2009
sorting a small shared array CUDA Programming and Performance	1	1968	October 22, 2009
quick sorting on GPU sorting floating point values. CUDA Programming and Performance	0	5449	April 26, 2011
Sorting key value pairs How to sort {key,valu} pairs CUDA Programming and Performance	4	4197	March 26, 2008

Sorting float2 and float arrays

Related topics