I can implement my own. But to achieve max speed, some sophisticated fine tuning needed.
Can ask are there any ready libraries to use? Thanks.
thrust has exactly this functionality and is versatile and fairly fast.
[url]http://docs.nvidia.com/cuda/thrust/index.html#algorithms[/url]
Thanks, that’s cool. Opened a new window to me.
Seems I am still quite new… Too bad.
Are there any other materials worth reading, before getting “advanced”? Thanks again.