fast sorting of smal arrays WinWidth*WinHeight arrays of size == 32

FROL · February 4, 2009, 3:34pm

Hi. I have a large number of arrays (WinWidth*WinHeight exactly). Each array have fixed size == 32.

I need to sort them as fast as it possible.

I wrote buble sort, but it is too slow. What sorting algorithm will be better to use in my case?

btw. my bubble sort.

/////////////////////////////////////////////////////////////////////////

////

template

< 

  int ARR_SIZE, // array size. shold be less or eq that CTA_SIZE

  class T,	  // type of array elements

  template <class> class LESS  // LESS<T>::exec(A,B) return true if A < B

>

__device__ void BubleSort(T data[])

{

  int idInChunck = threadIdx.x % ARR_SIZE;

  int chunckOffset = (threadIdx.x/ARR_SIZE)*ARR_SIZE;

for(int i=0;i<ARR_SIZE;i++)

  {

	int index = (idInChunck+i) % ARR_SIZE + chunckOffset;

	if(threadIdx.x%2 == 0 && index+1 < ARR_SIZE + chunckOffset)

	{

	  T A = data[index];

	  T B = data[index+1];

	  if(!LESS<T>::exec(A,B))

	  {

		data[index] = B;

		data[index+1] = A;

	  }

	}

	__syncthreads();

  }

}

tnanks.

FROL · February 4, 2009, 3:38pm

i forgot to say that elements have size == 8 bytes. And there floting point data.

Paul_Russell · February 4, 2009, 5:19pm

For small arrays you should be able to use a sorting network (see Knuth). Hopefully nvcc will convert min/max expressions into suitable branchless instruction sequences but I haven’t verified this.

[url=“Sorting network - Wikipedia”]http://en.wikipedia.org/wiki/Sorting_network[/url]

FROL · February 4, 2009, 9:02pm

thanks, i’ll try to do that.

RussAtkinson · February 4, 2009, 9:08pm

You want Bitonic sort, especially since it’s already in the CUDA examples.

cbuchner1 · February 5, 2009, 10:24am

Sorting networks for N=32 inputs are quite huge and require a lot of comparators (for N=24 it was around to 129 compare-and-swap operations already).

Things get simpler for small arrays: For N=8 inputs I’ve successfully implemented a sorting network in an OpenGL ARBfp1.0 fragment shader. ;)

Christian

FROL · February 5, 2009, 11:44am

so, you think that sorting network will be too huge, may be you have another idea?

Topic		Replies	Views
bubble sort in CUDA CUDA Programming and Performance	7	8337	June 11, 2011
Sort very small array in shared with 1 warps CUDA Programming and Performance	5	2316	October 12, 2021
Efficient kernel sort algorithm for very small Arrays CUDA Programming and Performance	3	6091	July 3, 2011
Thread Block level Sort Sorting small arrays CUDA Programming and Performance	3	1097	January 24, 2011
sorting on the GPU CUDA Programming and Performance	2	21451	May 20, 2007
Sort 32 element in a warp CUDA Programming and Performance	5	36	May 6, 2025
Random memory access and += Advice needed CUDA Programming and Performance	4	2413	August 17, 2008
Branchless Bitonic Sort with float keys CUDA Programming and Performance	3	880	December 11, 2021
Small sort function in kernel Is there a function similar to qsort in cuda? CUDA Programming and Performance	4	3866	December 13, 2010
How can i sort an array with CUDA? Who can tell me? CUDA Programming and Performance	5	7265	June 26, 2008

fast sorting of smal arrays WinWidth*WinHeight arrays of size == 32

Related topics