benchmarks for CUDPP 2.0


are there any benchmarks for CUDPP 2.0?

Authors of claim [font=arial, sans-serif]have outperformed CUDPP 1.0 by a noticeable margin on [/font][font=arial, sans-serif]the scan(), compact() and sort() operations.


The CUDPP sort implementation is commented-out in CUDPP 2.0. Instead the library forwards to thrust::sort. I restored the CUDPP sort and benchmarked it and the B40C sort:

I will have comparative benchmarks for segmented scan in a couple days which should be interesting.



@sean do you implement a [font=“arial, sans-serif”]4-bits per [/font][font=“arial, sans-serif”]pass, compared to the 2-bits per pass in chag:pp (IIRC) ([/font]

[font=“arial, sans-serif”] [/font]

I implement up to six bits per pass. I benchmark the timings for each bit-pass then find the optimal path to a full sort of keys between 1 and 32 bits. But the six bit pass is quite a bit faster than the others. My algorithm description gets into the math of all that.

quite interesting, thanks sean