benchmarks for CUDPP 2.0

Hi,

are there any benchmarks for CUDPP 2.0?

Authors of http://www.cse.chalmers.se/~billeter/pub/pp/ claim [font=arial, sans-serif]have outperformed CUDPP 1.0 by a noticeable margin on [/font][font=arial, sans-serif]the scan(), compact() and sort() operations.

Cheers![/font]

The CUDPP sort implementation is commented-out in CUDPP 2.0. Instead the library forwards to thrust::sort. I restored the CUDPP sort and benchmarked it and the B40C sort:
http://www.moderngpu.com/sort/mgpusort.html

I will have comparative benchmarks for segmented scan in a couple days which should be interesting.

sean

thanks!

@sean do you implement a [font=“arial, sans-serif”]4-bits per [/font][font=“arial, sans-serif”]pass, compared to the 2-bits per pass in chag:pp (IIRC) (http://www.cse.chalmers.se/~billeter/pub/pp/)?[/font]

[font=“arial, sans-serif”] [/font]

I implement up to six bits per pass. I benchmark the timings for each bit-pass then find the optimal path to a full sort of keys between 1 and 32 bits. But the six bit pass is quite a bit faster than the others. My algorithm description gets into the math of all that.

quite interesting, thanks sean