The Power of C++11 in CUDA 7

Originally published at: https://developer.nvidia.com/blog/power-cpp11-cuda-7/

Today I’m excited to announce the official release of CUDA 7, the latest release of the popular CUDA Toolkit. Download the CUDA Toolkit version 7 now from CUDA Zone! CUDA 7 has a huge number of improvements and new features, including C++11 support, the new cuSOLVER library, and support for Runtime Compilation. In a previous…

This looks really great! However, I am using fedora so I cannot try it out myself yet :-( Do you know when the fedora edition will be available?

Hi Kenneth. The .run file installer *might* work with Fedora 20, though we haven't tested it. It's worth a try. We're working on getting the Fedora 21 installer available soon (only 21 will be *officially* supported with CUDA 7 -- please see the release notes).

Hi Mark, thanks, nice and clear explanations. I have found a typo in the first code snippet, when calling count_if, there should be text instead of data.

Good catch; I fixed it. Thanks!

Hi Mark, which parts of the STL of C++11 can be used on the device with cuda 7? NB: I was thinking in defining classes that make use of std::vector, etc ... that would incorporate unified memory managed class, as in a previous posting of you, and then having __device__ __host__ functions using those classes. I have tried but it seems not to work. Does one need to use something like Thrust?, or am I mistaken? Thank you.

CUDA 7 adds support for C++11 language features in device code, but not the standard template library, I'm afraid. You can use a thrust::device_vector. You could indeed write your own vector class that uses managed memory, but existing STL headers won't "just work" because of the need to annotate all functions called on the device with "__host__ __device__"

That makes sense. Thanks for the clarification. I had started to do so, about the '__host__ __device__' with a ifdef/ifndef, but, clearly, I encountered problems using the STL methods. In order to use the implicit methods of STL, the closest and best thing to do seems to use Thrust. I had not looked at it before and it is clearly very good, as well, and probably close to what an adapted C++ for the device memory would be.