CUDA 9 Features Revealed: Volta, Cooperative Groups and More

jwitsoe · May 11, 2017, 7:20am

Originally published at: https://developer.nvidia.com/blog/cuda-9-features-revealed/

Figure 1: CUDA 9 provides a preview API for programming Tesla V100 Tensor Cores, providing a huge boost to mixed-precision matrix arithmetic for deep learning. At the 2017 GPU Technology Conference NVIDIA announced CUDA 9, the latest version of CUDA’s powerful parallel computing platform and programming model. CUDA 9 is now available as a free…

anon92106563 · May 11, 2017, 8:56am

The Gtx 1050 Ti Laptop, will be supported?

anon27657822 · May 11, 2017, 9:16am

What's exactly new, in terms of assembly programming ?
New libraries and software isn't exactly what you would call "new".

anon27657822 · May 11, 2017, 9:24am

Also, could you drop that idea of threads you promoted so long ?
Initially it was used for marketing purposes, to claim that GPU is capable of running 32x more threads than it actually can. I see no point in getting things worse and worse just for marketing lies. It would be easier for developers to adjust their code to the actual operations performed by GPU, that is array operations.

anon95180265 · May 11, 2017, 10:57pm

CUDA programs execute thousands of parallel threads. The threads are not as heavyweight as CPU threads, and they are created and run in parallel, but they are still threads, free to branch and take different execution paths. They are not limited to just array operations. Volta and CUDA 9 make this even more flexible.

anon95180265 · May 11, 2017, 10:58pm

Yes.

anon33870937 · May 12, 2017, 1:37am

I didn't know there were SIMT deniers.

anon27657822 · May 12, 2017, 2:42am

How come ? If one divides CPU cache into 64 byte-sized private areas and run 1 thread, one could claim that there are 512 threads per L1 cache. Such "parallel" threads would be still 1 thread actually. If they can't branch independently and the branches are emulated by having the rest of them wait till the branch is performed, they are not threads, in actual meaning.

anon95180265 · May 12, 2017, 5:01am

I've always thought the following blog post was a thoughtful, unbiased, well-explained analysis by someone who gets it. http://yosefk.com/blog/simd...

anon27657822 · May 12, 2017, 12:07pm

Is there a possibility that NVIDIA will publish the data on the following:
Instruction format (encoding) and timing for the new architecture,
possibility to use registers as temporary up to (before) the time they are written by the result of the operation (I wanted to know if one could use them as temporary until a write is performed)
tagging instructions for pseudo-threads and their effect, theoretical, if any,
register file write and read operation description, as well as addressing and throughput for register groups,
extension for tensor operation to the instruction set, if any.

anon87864977 · May 13, 2017, 5:12am

@Mark_Harris:disqus I enjoyed reading that blog post, thanks for sharing.

anon2580195 · May 15, 2017, 1:59pm

Thanks for the share Mark, the SIMT vs SIMD article is excellent.

anon60438259 · May 17, 2017, 8:02pm

Looks like the recorded sessions won't be available until June 8th for people who did not attend GTC 2017.

anon60242588 · May 26, 2017, 9:21pm

Hello Mark!

when i read this blog, 2 things catch my eyes:

1) in your screenshots you use a Titan card, but talking "as usual"
extensively about Tesla (okay only tesla exist now with volta engine, but on
many other nvidia paper you "unintentionally" talk about telsa or quadro, and very
few about titan)

since this blog is about CUDA 9, can you make a complete
list of all GPU compatible with CUDA 9?

2) "These new meta packages provide simple and clean installation of CUDA
libraries for deep learning and scientific computing"

my question
**********

I’m trying to clearly understand Nvidia positioning about the titan-line of
cards:

[+] Titan are simply the best choice for "any" CUDA developer : they
are (since their introduction) the fastest CUDA FP32 hardware available [titan XP even better than GP100], they are versatile (can works in any machine, actively cooled, support
both WDDM and TCC mode), they support almost the biggest VRAM, they have a
standard-positioned power plug, they provide -by around 6x- much more $ per
CUDA core ratio than any other "professional" card

[+] i think they were introduced as this, a perfect CUDA card for high end
workstations + Compute

now, since the most differentiation between all CUDA oriented cards is mostly
inside the driver (allowing or not TCC mode, optimized for openGL/CAD, allowing
or not virtualization, allowing or not 10bit output, allowing or not
remote/grid, only for servers, ...)

[-] why does titan card are using consumer/gaming drivers? With the famous
"unintentional virtualized bug" and the windows code 43 result + not
compatible with quadro/tesla drivers.

[-] i think it's time that Nvidia create a specific driver for “titan-for-compute”,
focusing on TCC mode + virtualization support - the perfect companion of any
CUDA developer

Also, this new driver will then indeed "provide simple and clean
installation of CUDA libraries for deep learning and scientific computing",
without requiring to install an huge driver package.

fred

anon95180265 · May 29, 2017, 2:45am

Here's complete list of CUDA compatible GPUs: https://developer.nvidia.co...

anon60242588 · May 29, 2017, 4:26am

thanks for this list, however i guess that :
- volta does require CUDA9 to work
- older chipset (i will say from kelpler to pascal) will all works ? i guess that some features will not be enabled? so a simple drivers update will bring cuda9 to all cards?

for question 2 :
I'm "dreaming" of a day that Nvidia will propose a clean, efficient and global driver for compute and deep learning. so no more "artificial" separation (from a driver perspective) between titan, quadro and tesla. i will be up to nvidia to cleary differentiate each product (eg : titan for fast and affordable CUDA development without compromise -specifically in the new virtualisation area-, telsa for datacenter and supercomputer and scientific compute/fp64, quadro for engineering and design). also we should be able to freely mix any of those card on same system, with unified driver.
basically, buy and use the most efficient card for each usage, without any driver limitation as today...

fred

anon31339863 · May 31, 2017, 1:21pm

Will we one day teach AI how to code and let the software developers die out?

anon64593595 · June 5, 2017, 7:02am

Have you planned to make the matrices Row-Major? O Any direct conversion?

anon95180265 · June 6, 2017, 6:22am

Can you clarify your question?

anon95180265 · June 6, 2017, 6:23am

Existing CUDA codes should run on Volta. To build new codes for Volta, you need CUDA 9. CUDA 9 will support older GPUs as well. Your "question 2" is not a question.

Topic		Replies	Views
CUDA 11 Features Revealed Technical Blog	4	666	October 16, 2024
Wishlist Place your considered suggestions here CUDA Programming and Performance	201	204317	April 13, 2009
CUDA 8 Features Revealed Technical Blog	51	863	November 8, 2018
Cooperative Groups: Flexible CUDA Thread Programming Technical Blog	32	12471	February 7, 2023
CUDA 4.0 CUDA Programming and Performance	63	507399	March 28, 2013
CUDA Toolkit 3.0 beta released now with public downloads CUDA Programming and Performance	104	430099	March 25, 2010
OpenCL or CUDA? CUDA Programming and Performance	16	10961	October 26, 2011
What can't you do in CUDA that you'd like? Requests for the future CUDA Programming and Performance	407	134576	May 26, 2010
An Even Easier Introduction to CUDA Technical Blog	141	6369	November 28, 2023
CUDA very slow performance CUDA Programming and Performance	21	16737	March 6, 2020

CUDA 9 Features Revealed: Volta, Cooperative Groups and More

Related topics