Use Thrust or CUDPP ? Fermi-optimized ?

HannesF99 · November 19, 2010, 8:13am

I wonder whether i should use the functions provided by the ‘Thrust’ library, or those from the CUDPP, for the ‘standard’ stuff like sorting, scanning, compaction etc.
Is there a performance difference between (the lateste versions of) Thrust and CUDPP, or does the Thrust library internally use CUDPP ?
As an example, is the fast sorting method which was recently added to thrust 1.3 release (see http://research.nvidia.com/news/thrust-v13-release) also used in CUDPP ?

Besides, i would like to know if the latest releases of Thrust and CUDPP have been already optimized for Fermi architecture (compute capability >= 2.0).

HannesF99 · November 19, 2010, 8:13am

I wonder whether i should use the functions provided by the ‘Thrust’ library, or those from the CUDPP, for the ‘standard’ stuff like sorting, scanning, compaction etc.
Is there a performance difference between (the lateste versions of) Thrust and CUDPP, or does the Thrust library internally use CUDPP ?
As an example, is the fast sorting method which was recently added to thrust 1.3 release (see http://research.nvidia.com/news/thrust-v13-release) also used in CUDPP ?

Besides, i would like to know if the latest releases of Thrust and CUDPP have been already optimized for Fermi architecture (compute capability >= 2.0).

nbell · November 24, 2010, 12:55am

Hi Hannes,

Thrust has its own implementations of those algorithms and does not depend on CUDPP. I haven’t benchmarked Thrust’s scan or reduce against CUDPP lately, but in most cases of interest the performance should be comparable. If you find otherwise let us know and we’ll do our best to fix it. AFAIK CUDPP does not yet use the fastest radix sort code so Thrust will be considerably faster there.

Thrust does incorporate some Fermi-specific optimizations, primarily in the radix sort code. In general, many Thrust algorithms will automatically benefit from Fermi features like the L1 cache and expanded register file, so no changes to the code are necessary.

It’s pretty easy to get started with Thrust, so I’d suggest giving it a try and letting us know what you find.

nbell · November 24, 2010, 12:55am

Hi Hannes,

Thrust has its own implementations of those algorithms and does not depend on CUDPP. I haven’t benchmarked Thrust’s scan or reduce against CUDPP lately, but in most cases of interest the performance should be comparable. If you find otherwise let us know and we’ll do our best to fix it. AFAIK CUDPP does not yet use the fastest radix sort code so Thrust will be considerably faster there.

Thrust does incorporate some Fermi-specific optimizations, primarily in the radix sort code. In general, many Thrust algorithms will automatically benefit from Fermi features like the L1 cache and expanded register file, so no changes to the code are necessary.

It’s pretty easy to get started with Thrust, so I’d suggest giving it a try and letting us know what you find.

HannesF99 · February 3, 2011, 3:49pm

Hi,

some feedback:
We recently tried to port some of the CUDPP functions we use ( reduction, scan, compaction, for details see http://www.20203dmedia.eu/materials/papers/FAH-2009-GRAVISMA.pdf ) to THRUST. But we saw that, on a GTX 285, the THRUST functions were signifcantly slower than their CUDPP counterparts (i think especially the reduction operation for calculating the maximum occuring eigenvalue in image). So we stick with CUDPP and hope that it will be maintained and optimized also for Fermi hardware.

Topic		Replies	Views
Thrust v1.0 release A high-level C++ template library for CUDA CUDA Programming and Performance	11	16829	May 30, 2009
benchmarks for CUDPP 2.0 CUDA Programming and Performance	5	9327	September 26, 2011
Thrust v1.3 release C++ Template Library for CUDA CUDA Programming and Performance	1	3126	October 5, 2010
Thrust v1.1 release A high-level C++ template library for CUDA CUDA Programming and Performance	6	13841	September 18, 2009
Thrust v1.2 release A high-level C++ template library for CUDA CUDA Programming and Performance	10	9229	December 14, 2010
CUDACasts Episode 16: Thrust Algorithms and Custom Operators Technical Blog	2	354	February 14, 2014
Question about CUDPP CUDA Programming and Performance	5	6812	May 10, 2011
Does CUDPP 1.1.1 version support sorting in 64 bit ? CUDA Programming and Performance	1	4816	May 13, 2010
high performance prefix sum / scan function in CUDA, looking for thrust, cuDPP library alterative CUDA Programming and Performance	3	3026	September 2, 2013
cudpp vs trust my vote cudpp CUDA Programming and Performance	2	14165	April 15, 2011

Use Thrust or CUDPP ? Fermi-optimized ?

Related topics