CUDA 7 Release Candidate Feature Overview: C++11, New Libraries, and More

Originally published at:

It’s almost time for the next major release of the CUDA Toolkit, so I’m excited to tell you about the CUDA 7 Release Candidate, now available to all CUDA Registered Developers. The CUDA Toolkit version 7 expands the capabilities and improves the performance of the Tesla Accelerated Computing Platform and of accelerated computing on NVIDIA…

do you support any c++14 features yet? like auto-deduced return types?

I've already used some c++11 features in device code before CUDA 7, (CUDA 6.5 with VS2010 just worked), does that only mean that it wasn't officially supported until CUDA 7? Can I pass lambdas to global function now?

You can use lambda in device code (as a functor or otherwise) as long as its definition is in device code. You can't (yet) pass a lambda from host code to device code (i.e. as a kernel argument).

I'm curious which C++11 features you were able to use in device code in the past? There was an undocumented option (--std=c++11) in CUDA 6.5, but not before that. But mvcc does use the EDG C++ front end so it's possible some features that require front-end compilation only may have worked if they were supported by the version of EDG used.

In any case, CUDA 7 is the first version with official support. Note that not everything in C++11 is supported on the device at this stage. It's mostly language features, not standard library features, like std::thread or STL. We plan to provide more detailed information in a future blog post.

All the features you mentioned except range-based for loop & variadic templates, etc which aren't supported by the host compiler (VC10), all work fine in CUDA 6.5 with VS2010, no special compiler flags needed.

Mark: Happy days! Looks like I can use libc++ on OS X. This means that I no longer have to maintain other dependencies because of libstdc++ dependency!

CUDA 7 does not officially support C++14 features, and from my quick tests, features like auto-deduced return types, generic lambdas, etc., are not working yet.

are the 3D FFT improvements only on the K20 or other GPUs as well?

This is because MSVC enables C++11 support without any flags / options specified. But CUDA 6.5 does not officially support C++11

The cuFFT improvements are not limited to K20 (I fixed the confusing wording in the post). Also, they are not limited to 3D FFTs! I've added a graph showing speedups for 1D FFTs.

Once again: that's a fantastic feature set in this release! We are really looking into the constexpr support on the device side (and to throw out a huge amount of self-written auto, lambda features).

One unrelated quick question: was the support for the PGI compiler on the host-side added (#439486 -> #1449951)?

Yes, the PGI C++ compiler is supported as a host compiler for nvcc in CUDA 7.0 on Linux.

Great! Do you know which pgi version(s)? I could not find anything in the header files nor the announcements.

cuSOLVER is great news for the signal processing community ! Is it possible to stream cuSOLVER functions in order to use them in batch mode (to compute many medium size matrices) ? It would be interesting to compare cuSOLVER with the batched solver sample code available in the registered dev website.

Does someone happen to know whether this new release of cuFFT does support callbacks (cufftXTSetCallback etc.) on Windows?

No, cufftXTSetCallback is not supported on Windows in this release.

Yes, cuSOLVER supports CUDA streams. Also, cuSolver contains some batched operations: batched sparse QR and batched refactorization. The cuSOLVER PDF documentation included with the CUDA Toolkit v7 RC download provides full details.

Well, it's a pity. Nevertheless, thanks a lot for your reply!

apparently I am too blind and cannot find the setting in CUDA NSight 7.0 which enables c++11 standard. Can you please help me and tell me where to enable this option? I am using NSight to compile/link the project.

Will CUDA 7 support 32-bit Windows?

The release notes document is rather unclear (says that CUDA Toolkit wold be 64-nit only)...