I am a newbie to CUDA.I want to sort the elements in an array.I think out a way to realize it.Details as follows:
One element is a structure body consisted of key and value ,like <key,value>.All elements has been put into a Block,stored in share memory,and one thread owned an element.Can i realize this work in a parallel way?
Besides ,I am sorry for my poor English.
Any Replies would be apprieciated.
Having only 1 block will result in poor performance. Also you can have maximum 512 threads per block, so in your case you would have a maximum of 512 elements to be sorted. It is probably wise to check the radixsort algorithm that is in the particles demo in the SDK.
Thanks for your reply!
I understand your suggestion about promoting the performance ,but what i concerned is how to realize it in a parallel way.I will look over the particles demo in the SDK.
Thanks again. :)
When it comes to an array size of several millions or even 100s Millions,
the radix and also bitonic sorting become very slow, because the number of operations (swap and compare) increase dramatically in contrast to Quicksort.
Here is something
which might help.
how can that be true? radix sorting doesn’t use swap and compare operations by definition
is it possible to modify Bitonic sort example to make it compatible for sort with more than 512 elements?? i´m not able to do it. I know bitonic sort with a lot of elements is less efficient than radixsort but i want to modify Bitonic Sort. Thanks. Regards.