thrust performance?

hows thrust performance compare to standard CUDA C/C++, multiplication/reducation etc? also whats the performance hit when convert thrust vector back to standard float/double?

some the stuff I need do in standard float/double such as cufft, which mean I need convert to thrust and back few times

Well, converting from a vector to anything else will likely be dreadfully slow.

As for Thrust itself, it’s actually pretty fast. Much of the Thrust CUDA back-end is written in terms of CUB. I know the CUB developers wrote a single-pass prefix sum and that got back-ported into the Thrust implementations.

So yes, Thrust is relatively fast. It’s a general-purpose library though. It doesn’t attempt to solve a specific problem but it’s like the STL in that it’s usually fast enough you don’t have to worry about it until you have to worry about it.

Keep in mind, a huge portion of the performance in Thrust is found in your usage of their fancy iterators.

I see, so to get best performance its still recommend to do CUDA C. especially sometime I might need convert the vector from thrust to generic.

Yes, Thrust and all of C++ as a whole are significantly slower than C.

C is the fastest language ever.

RAII only gets in your way. Exceptions are dumb. Templates literally make code slower even though it’s a compile-time construct. Move semantics are confusing and dumb and make C++ slower as well. Fancy iterator composition is slow. Well-tested algorithms that’ve worked on multitudes of different GPUs for many different users are not as good as some prefix sum you wrote one Saturday morning.

I realize now that the sarcasm of my post is likely not going to be picked up on.

s00wjh, you’re currently experiencing what’s called, “C programmer syndrome”. Basically, it’s a thing people who know C but not C++ do where they think that because C++ allows you to do things like hide implementation details that it’s a slower language.

We all go through it at a point in time. I did when I was first learning C++ after only knowing C.

Also keep in mind, you do know Thrust vectors can you give a pointer to the underlying data, right?

I have not used Thrust myself, but I know that people who use Thrust typically love the programmer productivity it enables. I have seen multiple cases where domain experts (e.g. scientists) new to CUDA whipped up a GPU-accelerated application in a matter of weeks by using Thrust and were thrilled with the performance gains vis-a-vis the previous CPU-only versions.

I am not sure where C vs C++ plays into the scenario at hand, because modern CUDA is a language in the C++ family (CUDA switched from C to C++ quite early in its existence, around 2008 if memory serves).

That said, the performance of well-written C++ code and C code are often similar (especially when restricted pointers are available in C++ as a vendor extension, as they are in CUDA). But since C++ tends to hide performance-relevant overhead that would be plainly visible in equivalent C code, in practice C++ programs more often experience performance issues. Modern compilers do their level best to mitigate that.

I agree that it’s easy to hide overhead in C++. That is an unfortunate side-effect of the types of abstractions that C++ promotes.

I push for Thrust because in most cases, programming is shockingly harder than we think it is. Thrust’s code is well-tested and established which is valuable. A strongly-tested library backing your code is only a good thing.

I wanted to emphasize that Thrust is a general-purpose library so you can’t always express your problem in terms of it but when you can, the implementations are largely satisfactory.

Looks like we are in “violent agreement” :-) Using established libraries is the way to go for the vast majority of programmers, with the possible exception of experts in the field a particular library is addressing.

The importance of libraries was recognized in the CUDA team from the very beginning, and it seems that this approach has not changed, although engineering resources at NVIDIA seem to be seriously stretched trying to move all libraries forward simultaneously. Experts will always be able to identify room for improvement in any of the libraries associated with CUDA, but NVIDIA has been quite good at adopting third-party improvements where offered under a suitable license.

some the stuff I need do in standard float/double such as cufft, which mean I need convert to thrust and back few times

thrust can operate on vectors in cpu or device memory. cufft probably too. so you don’t need to move anything, just convert pointers