Thrust v1.2 release A high-level C++ template library for CUDA

JaredHoberock · March 23, 2010, 6:16am

We are pleased to announce the release of Thrust v1.2, an open-source template library for developing CUDA applications. Modeled after the C++ Standard Template Library (STL), Thrust brings a familiar abstraction layer to the realm of GPU computing.

Version 1.2 adds several new features, including:

[*]support for multicore CPUs via OpenMP

[*]support for CUDA 3.0 and new GPUs based on the Fermi architecture

[*]support for the Ocelot virtual machine

[*]pseudo random number generation

[*]key-value reduction

[*]set intersection

[*]and more!

As the following code example shows, Thrust programs are concise and readable.

#include <thrust/device_vector.h>

#include <thrust/iterator/constant_iterator.h>

#include <thrust/reduce.h>

#include <string>

// This example computes a run-length code for an array

// of characters using a key-value (or segmented) reduction

int main(void)

{

  // input data on the host

  std::string data = "aaabbbbbcddeeeeeeeeeff";

size_t N = data.size();

// copy input data to the device

  thrust::device_vector<char> input(data.begin(), data.end());

// allocate storage for output data and run lengths

  thrust::device_vector<char> output(N);

  thrust::device_vector<int>  lengths(N);

// compute run lengths

  size_t num_runs =

	thrust::reduce_by_key(input.begin(), input.end(),		// input key sequence

						  thrust::constant_iterator<int>(1), // input value sequence

						  output.begin(),					// output key sequence

						  lengths.begin()					// output value sequence

						  ).first - output.begin();		  // compute the output size

// output is now  [a,b,c,d,e,f]

  // lengths is now [3,5,1,2,9,2]

  return 0;

}

Get started with Thrust today! First download Thrust v1.2 and then follow the online quick-start guide. Refer to the online documentation for a complete list of features. Many concrete examples and a set of introductory slides are also available.

Thrust is open-source software distributed under the OSI-approved Apache License v2.0.

Acknowledgments

[*]Thanks to Gregory Diamos for contributing a CUDA implementation of set_intersection

[*]Thanks to Ryuta Suzuki & Gregory Diamos for rigorously testing Thrust’s unit tests and examples against Ocelot

[*]Thanks to Tom Bradley for contributing an implementation of normal_distribution

[*]Thanks to Joseph Rhoads for contributing the example summary_statistics

eelsen · March 23, 2010, 5:50pm

Nice work guys!

Cyril_Crassin · March 23, 2010, 8:50pm

Hi, congratulation for this work, I really love the design of Thrust and I use it intensively for prototyping things.
However, I have found some performance problems in Thrust 1.1. For compactions for instance, it appears to be slower than CUDPP in most case, and it seems to be due to on the fly allocations of temporary buffers. Is this problem fixed with this new release ?

Thanks.

nbell · March 24, 2010, 11:51pm

Scan performance is about 25% faster than v1.1, so the stream compaction functions will be noticeably faster. In v1.3 we’ll optimize functions like copy_if to use much less temporary memory and fewer read/write passes (it currently does a complete scan, which is unnecessary). So if v1.2 isn’t fast enough for you then v1.3 should be. If you find specific instances where the performance is lacking let us know and we’ll work on it for v1.3.

This release was mainly about adding features and making things as robust as possible. For example, you can use large data types or complex functors without worrying whether Thrust algorithms will run out or registers and shared memory. There shouldn’t be any artificial limits on input sizes either, so you can scan/sort/reduce huge data sets.

yyfn · March 25, 2010, 2:16am

Scan performance is about 25% faster than v1.1, so the stream compaction functions will be noticeably faster. In v1.3 we’ll optimize functions like copy_if to use much less temporary memory and fewer read/write passes (it currently does a complete scan, which is unnecessary). So if v1.2 isn’t fast enough for you then v1.3 should be. If you find specific instances where the performance is lacking let us know and we’ll work on it for v1.3.

This release was mainly about adding features and making things as robust as possible. For example, you can use large data types or complex functors without worrying whether Thrust algorithms will run out or registers and shared memory. There shouldn’t be any artificial limits on input sizes either, so you can scan/sort/reduce huge data sets.

thank you!but why you guy use openmp other than openCL?

Cyril_Crassin · March 25, 2010, 2:44am

Scan performance is about 25% faster than v1.1, so the stream compaction functions will be noticeably faster. In v1.3 we’ll optimize functions like copy_if to use much less temporary memory and fewer read/write passes (it currently does a complete scan, which is unnecessary). So if v1.2 isn’t fast enough for you then v1.3 should be. If you find specific instances where the performance is lacking let us know and we’ll work on it for v1.3.

This release was mainly about adding features and making things as robust as possible. For example, you can use large data types or complex functors without worrying whether Thrust algorithms will run out or registers and shared memory. There shouldn’t be any artificial limits on input sizes either, so you can scan/sort/reduce huge data sets.

That’s great, I rewrote thrust copy_if so that I can pass a pre-allocated temporary buffer and it gives me a very good performance improvement.

Do you have a schedule for the release of thrust 1.3 ?

nbell · March 27, 2010, 11:46pm

There’s an entry in our FAQ that answers that:

The primary barrier to OpenCL support is the lack of an OpenCL compiler and runtime with support for C++ templates (e.g. something similar to nvcc and the CUDA Runtime).

These features are necessary to achieve close coupling of the host and device codes.

We can support OpenMP because standard C++ compilers allow us to combine OpenMP and C++ templates. To our knowledge there is no compiler that provides us with way to do the same with OpenCL. User-defined types and operators would be difficult to support using the “driver” API as well (like OpenCL or the CUDA driver API). For example, if someone writes the following code,

struct MyStruct

{

   int x; float y; char z;

};

thrust::device_vector<MyStruct> v(10);

then the definition of ‘MyStruct’ needs to be available to all the kernels that operator on those types. With nvcc and the CUDA Runtime API this happens naturally, just as it would in a normal C/C++ program. However, with a driver API we’d have to know to insert the definition of ‘MyStruct’ into any code snippet we wanted to execute on the device.

While there are workarounds to this problem, none of them allow us to present the same “pure” interface of Thrust for CUDA and OpenMP.

nbell · March 27, 2010, 11:49pm

We don’t have a pre-defined schedule for the next release, but if I had to guess I would say 4-6 months from now. However, it’s likely that we’ll improve copy_if before we make the next release, so if you’re willing to use a (potentially unstable) development version of Thrust, then you can get it sooner.

Anyway, I’ll post an update here when we’ve improved copy_if performance.

cbuchner1 · April 5, 2010, 4:07pm

Is there any advantage in using the CUDA 3.0 toolkit over staying with the 2.3 toolkit?
When do you think thrust will drop support for older toolkits, such as 2.3?

Christian

JaredHoberock · April 5, 2010, 7:18pm

nvcc 3.0 supports C++ much better than any previous compiler release. To see what I mean, you can grep through the Thrust source for “WAR” to see the places where we’ve had to work around a bug.

We intend to unsupport nvcc 2.3 in Thrust v1.3 [1], and ASAP in the development version.

[1] http://code.google.com/p/thrust/issues/detail?id=107

yyfn · December 14, 2010, 1:55am

There’s an entry in our FAQ that answers that:

When will Thrust support OpenCL?
The primary barrier to OpenCL support is the lack of an OpenCL compiler and runtime with support for C++ templates (e.g. something similar to nvcc and the CUDA Runtime).

These features are necessary to achieve close coupling of the host and device codes.

We can support OpenMP because standard C++ compilers allow us to combine OpenMP and C++ templates. To our knowledge there is no compiler that provides us with way to do the same with OpenCL. User-defined types and operators would be difficult to support using the “driver” API as well (like OpenCL or the CUDA driver API). For example, if someone writes the following code,
struct MyStruct

{

   int x; float y; char z;

};

thrust::device_vector<MyStruct> v(10);
then the definition of ‘MyStruct’ needs to be available to all the kernels that operator on those types. With nvcc and the CUDA Runtime API this happens naturally, just as it would in a normal C/C++ program. However, with a driver API we’d have to know to insert the definition of ‘MyStruct’ into any code snippet we wanted to execute on the device.

While there are workarounds to this problem, none of them allow us to present the same “pure” interface of Thrust for CUDA and OpenMP.

thank you for your response. I got it. I use CUDA runtime other than CUDA driver, and is not familiar with OpenCL, so my opinion is immature.

Topic		Replies	Views
Thrust v1.3 release C++ Template Library for CUDA CUDA Programming and Performance	1	3168	October 5, 2010
Thrust v1.1 release A high-level C++ template library for CUDA CUDA Programming and Performance	6	13914	September 18, 2009
Thrust v1.0 release A high-level C++ template library for CUDA CUDA Programming and Performance	11	16959	May 30, 2009
Expressive Algorithmic Programming with Thrust Technical Blog	2	436	March 4, 2015
Device code generated from -stdpar versus thrust nvc, nvc++ and nvfortran	12	2744	June 13, 2022
an easy-to-use CUDA library CUDA Programming and Performance	28	6859	January 29, 2012
When upgrade from CUDA12.4 to 12.5 the compilation became broken CUDA Setup and Installation	5	1063	August 19, 2024
CUDACasts Episode 15: Introduction to Thrust Technical Blog	0	380	August 25, 2020
About thrust in cuda 13.2 CUDA Programming and Performance chinese	6	236	May 26, 2026
thrust performance? CUDA Programming and Performance	9	6900	March 1, 2017

Thrust v1.2 release A high-level C++ template library for CUDA

Related topics