Hi,
I am a new to CUDA. I’d like to ask about sorting in CUDA.
Is it possible to sort arrays of primitive types in CUDA?
I have an array of an array of floats.
e.g.
1st array: 1.0 2.0 0.5
2nd array: 1.0 1.1 1.0
3rd array: 3.2 2.0 0.1
I want to sort these arrays increasing in the last floats.
result:
1st array: 3.2 2.0 0.1
2nd array: 1.0 2.0 0.5
3rd array: 1.0 1.1 1.0
Would this be possible in CUDA?
If so which algorithm can I use? Any suggestions?
Hope that makes sense,
Thanks!
LSChien
February 12, 2011, 2:07pm
2
Hi LSChien,
Thanks for your reply. It seems that I can make use this function of thrust which uses a comparator to determine the
sorting
template<typename RandomAccessKeyIterator,
typename RandomAccessValueIterator,
typename StrictWeakOrdering>
void stable_sort_by_key(RandomAccessKeyIterator keys_first,
RandomAccessKeyIterator keys_last,
RandomAccessValueIterator values_first,
StrictWeakOrdering comp);
Although I am using CUDA version 2.3 and it seems the requirement is CUDA 3.0 or higher.
Do you know of other library that perform similar function?
Many Thanks,
Crankie
February 14, 2011, 6:48am
4
You can also create a simple index vector that stores the initial position of these array: Eg. ,
---------- 1st Col----2nd Col-----3rd Col----index_vector
1st array:- 1.0 ------ 2.0 ------- 0.5 ----- 0
2nd array:- 1.0 ------ 1.1 ------- 1.0 ----- 1
3rd array:- 3.2 ------ 2.0 ------- 0.1 ----- 2
Now, your 3rd cols and the index_vector forms a key-value pair that you can sort based on key values using CUDA SDK Radix sort example code. Finally
You will have:
3rd Col index_vector
0.1 -------- 2
0.5 -------- 0
1.0 -------- 1
In this, the index_vector gives you the final positions of the array’s 1, 2 and 3, This index_vector can be used in the subsequent code, inplace of disturbing the inital 2D array ( if used ) for storing 1st, 2nd and 3rd arrays.
Thanks! that worked!
I would also like to ask an unrelated question.
Say I have a stack that is allocated on the device… I want to perform the simple calculation below, sequential or otherwise.
d_ehvStack[iteration] = 1;
for (int i = 0; i < n; i++) {
d_ehvStack[iteration] *= d_frontsArray[i];
}
This operation is not allowed in the host because the device memory can’t be accessed.
How do I perform this simple operation without copying back the array to host since this copy back to host is quite expensive.
Would I have to create a kernel function for each such operation that only require one thread? This seems tedious.
Thanks in Advance!
You can also create a simple index vector that stores the initial position of these array: Eg. ,
---------- 1st Col----2nd Col-----3rd Col----index_vector
1st array:- 1.0 ------ 2.0 ------- 0.5 ----- 0
2nd array:- 1.0 ------ 1.1 ------- 1.0 ----- 1
3rd array:- 3.2 ------ 2.0 ------- 0.1 ----- 2
Now, your 3rd cols and the index_vector forms a key-value pair that you can sort based on key values using CUDA SDK Radix sort example code. Finally
You will have:
3rd Col index_vector
0.1 -------- 2
0.5 -------- 0
1.0 -------- 1
In this, the index_vector gives you the final positions of the array’s 1, 2 and 3, This index_vector can be used in the subsequent code, inplace of disturbing the inital 2D array ( if used ) for storing 1st, 2nd and 3rd arrays.
Crankie
February 15, 2011, 12:37pm
6
Take a look at the reduction sample in SDK. That does a add , reduce but your case seems to be a multiply reduce but can be done by little plumbing! :)