Bitonic sort

Hello, I’m trying to program a simple Bitonic Sort for the GTX580, however, since all algorithm I actually find are recurssive and, as far as I know, I can’t make a recurssive call on the GPU, does anyone knows a simple Bitonic implementation for CUDA?
And by simple, I mean simpler than the example providade with the CUDA samples.

Thank you very much.

(Or any other sorting algorithm, since this is a very small array, around 100 structures, I’m even considering making it with a insertion sort at the start of the kernel)

This looks interesting: [url]http://www.cs.rutgers.edu/~venugopa/parallel_summer2012/cuda_bitonic.html[/url]

MK

Yeah, it seems pretty easy, can anyone say if it’s a good implementation?

Thank you!

I tried that code but for some reason it has so many mistakes in it. I managed to make it compile-run but its still gets errors and incorrect output… Cannot understand why bitonic sort is such a mystery code, you cannot find a good code with good explanation…