C++ support for STL containers in device code and memory

Does support for C++ in CUDA 3.0 mean that one will be able to use standard STL containers such as vector, map, multimap etc. in device code and device memory?

It is supported by a library called Thrust
see this link
[url=“http://gpgpu.org/2009/05/31/thrust”]http://gpgpu.org/2009/05/31/thrust[/url]

Thrust doesn’t support STL stuff in device code, it wraps host code into STL-like sweetness.

I don’t think anybody knows yet to what extent will STL work on Fermi.

Thanks!

Why would someone like to use STL like stuff in GPU, when memory access has to be coalesced and so on.
Would not it be slow?

For the same reason that you write code in C (or C++, or Fortran etc.) and not in assembler.

Can you explain examples of usage of STL in gpu, please?

Right now, I use thrust wherever possible in my code - as noted above, this is basically STL::vector for the GPU (albeit controlled entirely from the host side). It’s within the realm of possibility that, for the particular program I have, carefully written CUDA code could outperform thrust. However, I doubt that the performance difference would be substantial (thrust is written by experts at NVIDIA, so the only way I can be ‘better’ is by being less general), and I am certain that it would take me a long, long time to implement. The places where I’m using thrust are known to be far, far from the ‘Amdahl path’ so it’s pointless to spend days (if not weeks) of effort doing something thrust has pre-packaged.

When STL implementations become available for the GPU, people will make similar tradeoffs. You are correct that some portions of the STL are likely to give horrible performance on the GPU - especially if they encourage people to allocate and deallocate memory all over the place. That can be a problem on the CPU too - it’s just that it’s less obvious there (since you don’t have thousands of threads competing for a lock).

Bitten by double-post demon

This Thrust looks good. I guess Thrust idea is correct - stl is on host and processing is binded to gpu this gpu is not devastated by stl memory accesses.

STL memory accesses don’t have to be any slower than coding by hand. Ex. a std::vector is just an ordinary array wrapped with some nicer semantics and autoresizing. Actually, if you took the address of the first element of a vector, you’d have the pointer to the underlying native array (ie &myVector[0]). Using other types of containers is not something you’d usually want to do on a GPU but they wouldn’t be any slower than hand-coded pointer chasing anyway. And you can easily convert them to vectors with less code and ways to screw up than by doing it by hand with, say, another linked-list implementation and an array pointer. You could also have, say, list<vector > and have each block work on a single vector of that list - coalesced accesses yet a relatively complex data structure. Very little code.

STL containers generally aren’t any slower than whatever one would come up with otherwise (save for some more elaborate implementations of trees or such) and the wrapping makes them easier to use and compatible with more interfaces.

Actually if used appropriately they might be faster than a lot of structures coded in C. And they would fit perfectly into CUDA 3.2, I wonder if there are plans to support C++ stdlib, the stdlib is kind of part of C++ External Image))