Hi all, I have met the issues with the radix sort in CUDA SDK. I remember one guy also had met similar issues. That is for some #elements (not very large), the application will display the error messages:
./radixSort Starting…
Using CUDA device [0]: Tesla T10 Processor
Sorting 1048576 32-bit unsigned int keys and values
testradixsort.cpp(277) : cudaSafeCall() Runtime API error : unspecified launch failure.
Then for some other #elements, the application will have such problems:
Using CUDA device [0]: Tesla T10 Processor
Sorting 1000000 32-bit unsigned int keys and values
radixSort, Throughput = 134.4813 MElements/s, Time = 0.00744 s, Size = 1000000 elements, NumDevsUsed = 1, Workgroup = 256
Unordered key[0]: 226496563 > key[1]: 66060340
Incorrectly sorted value[3] (88271): 4267737121 != 481842
FAILED
My GPU is Tesla S1070, I believe there are somethings wrong here. anyone has suggestions?
Additionally, I want to sort the 64-bit data type (unsigned long), I am not sure whether this radix sort version can support the 64-bit unsigned long type.