benchmarks for CUDPP 2.0


are there any benchmarks for CUDPP 2.0?

Authors of claim have outperformed CUDPP 1.0 by a noticeable margin on the scan(), compact() and sort() operations.


The CUDPP sort implementation is commented-out in CUDPP 2.0. Instead the library forwards to thrust::sort. I restored the CUDPP sort and benchmarked it and the B40C sort:

I will have comparative benchmarks for segmented scan in a couple days which should be interesting.



@sean do you implement a 4-bits per pass, compared to the 2-bits per pass in chag:pp (IIRC)

I implement up to six bits per pass. I benchmark the timings for each bit-pass then find the optimal path to a full sort of keys between 1 and 32 bits. But the six bit pass is quite a bit faster than the others. My algorithm description gets into the math of all that.

quite interesting, thanks sean