Use Thrust or CUDPP ? Fermi-optimized ?

I wonder whether i should use the functions provided by the ‘Thrust’ library, or those from the CUDPP, for the ‘standard’ stuff like sorting, scanning, compaction etc.
Is there a performance difference between (the lateste versions of) Thrust and CUDPP, or does the Thrust library internally use CUDPP ?
As an example, is the fast sorting method which was recently added to thrust 1.3 release (see http://research.nvidia.com/news/thrust-v13-release) also used in CUDPP ?

Besides, i would like to know if the latest releases of Thrust and CUDPP have been already optimized for Fermi architecture (compute capability >= 2.0).

I wonder whether i should use the functions provided by the ‘Thrust’ library, or those from the CUDPP, for the ‘standard’ stuff like sorting, scanning, compaction etc.
Is there a performance difference between (the lateste versions of) Thrust and CUDPP, or does the Thrust library internally use CUDPP ?
As an example, is the fast sorting method which was recently added to thrust 1.3 release (see http://research.nvidia.com/news/thrust-v13-release) also used in CUDPP ?

Besides, i would like to know if the latest releases of Thrust and CUDPP have been already optimized for Fermi architecture (compute capability >= 2.0).

Hi Hannes,

Thrust has its own implementations of those algorithms and does not depend on CUDPP. I haven’t benchmarked Thrust’s scan or reduce against CUDPP lately, but in most cases of interest the performance should be comparable. If you find otherwise let us know and we’ll do our best to fix it. AFAIK CUDPP does not yet use the fastest radix sort code so Thrust will be considerably faster there.

Thrust does incorporate some Fermi-specific optimizations, primarily in the radix sort code. In general, many Thrust algorithms will automatically benefit from Fermi features like the L1 cache and expanded register file, so no changes to the code are necessary.

It’s pretty easy to get started with Thrust, so I’d suggest giving it a try and letting us know what you find.

Hi Hannes,

Thrust has its own implementations of those algorithms and does not depend on CUDPP. I haven’t benchmarked Thrust’s scan or reduce against CUDPP lately, but in most cases of interest the performance should be comparable. If you find otherwise let us know and we’ll do our best to fix it. AFAIK CUDPP does not yet use the fastest radix sort code so Thrust will be considerably faster there.

Thrust does incorporate some Fermi-specific optimizations, primarily in the radix sort code. In general, many Thrust algorithms will automatically benefit from Fermi features like the L1 cache and expanded register file, so no changes to the code are necessary.

It’s pretty easy to get started with Thrust, so I’d suggest giving it a try and letting us know what you find.

Hi,

some feedback:
We recently tried to port some of the CUDPP functions we use ( reduction, scan, compaction, for details see http://www.20203dmedia.eu/materials/papers/FAH-2009-GRAVISMA.pdf ) to THRUST. But we saw that, on a GTX 285, the THRUST functions were signifcantly slower than their CUDPP counterparts (i think especially the reduction operation for calculating the maximum occuring eigenvalue in image). So we stick with CUDPP and hope that it will be maintained and optimized also for Fermi hardware.