benchmarks for CUDPP 2.0

nyiotis · September 19, 2011, 8:47pm

Hi,

are there any benchmarks for CUDPP 2.0?

Authors of http://www.cse.chalmers.se/~billeter/pub/pp/ claim [font=arial, sans-serif]have outperformed CUDPP 1.0 by a noticeable margin on [/font][font=arial, sans-serif]the scan(), compact() and sort() operations.

Cheers![/font]

SeanB · September 19, 2011, 10:54pm

The CUDPP sort implementation is commented-out in CUDPP 2.0. Instead the library forwards to thrust::sort. I restored the CUDPP sort and benchmarked it and the B40C sort:
http://www.moderngpu.com/sort/mgpusort.html

I will have comparative benchmarks for segmented scan in a couple days which should be interesting.

sean

nyiotis · September 20, 2011, 1:54pm

thanks!

nyiotis · September 25, 2011, 11:35pm

@sean do you implement a [font=“arial, sans-serif”]4-bits per [/font][font=“arial, sans-serif”]pass, compared to the 2-bits per pass in chag:pp (IIRC) (http://www.cse.chalmers.se/~billeter/pub/pp/)?[/font]

[font=“arial, sans-serif”] [/font]

SeanB · September 26, 2011, 3:00am

I implement up to six bits per pass. I benchmark the timings for each bit-pass then find the optimal path to a full sort of keys between 1 and 32 bits. But the six bit pass is quite a bit faster than the others. My algorithm description gets into the math of all that.

nyiotis · September 26, 2011, 2:06pm

quite interesting, thanks sean

Topic		Replies	Views
Use Thrust or CUDPP ? Fermi-optimized ? CUDA Programming and Performance	4	3532	February 3, 2011
CUDPP 1.1 Now Available CUDA Data-Parallel Primitives Library Announcements	0	32922	July 1, 2009
CUDPP 1.0 Alpha Release Adds Segmented Scan, Sparse Matrices Announcements	0	12816	April 20, 2008
Does CUDPP 1.1.1 version support sorting in 64 bit ? CUDA Programming and Performance	1	4816	May 13, 2010
CUDA Data-Parallel Primitives Library Released Announcements	2	31235	November 6, 2007
python-cuda bindings for CUDPP 1.1 ctypes Python bindings for CUDPP 1.1 CUDA Programming and Performance	0	4304	July 20, 2009
CUDA speed up for bitonic merge sort speedup CUDA Programming and Performance	1	5953	March 3, 2009
find the top Nth biggest of the elements CUDA Programming and Performance	12	5742	November 29, 2009
Someone famaliar with Cudpp? CUDA Programming and Performance	1	5446	March 4, 2008
Time problem for cudpp sort CUDA Programming and Performance	0	868	January 5, 2009

benchmarks for CUDPP 2.0

Related topics