Hello! My sorting of doubles in CUDA works rather slow and I found links to bitonic and thrust, but the first is not for doubles and the second should be called from host while my array is on the device. Array is ~2000 elements. Does somebody know a way to speed up sorting? Any hint is appreciated.
Thanks.