CUDA 9 Features Revealed: Volta, Cooperative Groups and More

anon64593595 · June 6, 2017, 6:52am

Hello.
Currently, I'm
having problems with cuBlas, this library works with matrices of type
column-major, most of my programs are defined in matrix of type
row-major. CuBlas would be much friendlier, if I could take the data directly in both ways, and work internally in the most convenient way.
Thank you so much

anon95180265 · June 13, 2017, 2:22am

The cuBLAS API allows you to specify a transpose of your matrices. In many cases, this is handled inside the computational kernels without additional copies, so it is efficient. See the documentation for cublasOperation_t: http://docs.nvidia.com/cuda...

anon69064686 · June 21, 2017, 9:46pm

When can we expect a (RC) release? It's been over a month! Can't wait to try the new features.

anon65694506 · June 21, 2017, 10:57pm

Do cooperative groups allow synchronization across multiple SMs?

I believe __syncthreads only works within a single SM. I was wondering if this limitation is lifted with cooperative groups.

anon95180265 · June 22, 2017, 2:29am

Yes, the example in the post shows how you will be able to call this_grid() to get a group referring to all threads running on the GPU (on all SMs). This can then be synchronized as shown. This functionality requires Pascal or later GPUs. In CUDA 9 you will be limited to only synchronizing ALL threads, not a subset of thread blocks. Hopefully we can generalize that and make it more flexible in a future release.

anon65694506 · June 22, 2017, 3:07am

Thanks, that looks like what I was looking for. Previously we had to resort to tricks like the lock-free global spinlocks ( http://eprints.cs.vt.edu/ar... ) - hopefully this new technique will be efficient

anon98760155 · June 26, 2017, 1:30am

The link to your talk in GPU Tech on demand is not working, could you fix it , please?

anon28062191 · July 4, 2017, 6:57pm

Great article! Thanks Mark!

anon69064686 · July 10, 2017, 11:53am

You can find it here: http://on-demand-gtc.gputec...

Direct link: http://on-demand.gputechcon...

anon95180265 · July 10, 2017, 12:32pm

Fixed. Thanks.

anon53073872 · July 16, 2017, 8:57pm

Release again ancient Keppler GTX780 in 6 and 8GB GDDR5 in 100$ - make a market-aggresive ( under production costs - like Sony "patented" consoles ) good-quality product in low price. It should be competitive to better products from AMD in range of lower prices. Make a fusion in CUDA with AMD - it is better suitable than OpenCL in my personal opinion. GTX 1030 price is too high ( GT 730 4GB has better quality/price coeff. ). Is not it more profitable to make a LOT easier to produce older technology, than raw recalculation of new products line? Post Scriptum it is my personal opinion - I am not an expert, it should more conservative to ask an economical specialist.

anon52605837 · August 1, 2017, 4:25pm

It already exists it is called autocoding, automatic coding I don't remember..I think "DeepCoder" from Microsoft...

anon60242588 · August 4, 2017, 3:25am

question 2 is back, thanks SO MUCH to AMD that is kicking nvidia politic in the ass. so now, titan new driver is "magically" providing "some" (no precision" features of the quadro drivers. viva competition!

http://www.nvidia.com/downl...
https://www.reddit.com/r/nv...

what's next :
- provide a drivers that support 100% virtualization
- add full 10bit support
- enable to switch from "gaming" driver to "pro", without even rebooting
- create a REAL distinction between titan and quadro

anon60242588 · August 4, 2017, 3:26am

question 2 is back, thanks SO MUCH to AMD that is kicking nvidia politic
in the ass. so now, titan new driver is "magically" providing "some"
(no precision) features of the quadro drivers. viva competition!

http://www.nvidia.com/downl...

https://www.reddit.com/r/nv...

what's next :
- provide a drivers that support 100% virtualization
- add full 10bit support
- enable to switch from "gaming" driver to "pro", without even rebooting
- create a REAL distinction between titan and quadro

anon751385 · August 4, 2017, 4:14pm

I tested CUDA 9 over 8 today and got expected speedup due to better FFT performance but also got speedup of code that serially executes groups of CUDA kernels and I do not quite understand why it is faster when the individual groups of kernels (so-called modules) are only marginally faster on CUDA 9, the only exception is the already mentioned group of CUDA kernels that uses FFTs, which is around 19% faster. Any thoughts?

anon71020961 · August 4, 2017, 6:50pm

What sort of speedup are you getting? Looking forward to getting our ocean FFTs upgraded..

anon751385 · August 4, 2017, 7:00pm

20% approx.

anon11949834 · August 4, 2017, 11:02pm

May I refer you to "A very comprehensive and precise spec"
http://www.commitstrip.com/...

anon11949834 · August 4, 2017, 11:08pm

What about Cooperative Groups?
They are quite useful in some applications.

anon88690092 · August 10, 2017, 2:01pm

How it is possible to synchronize ALL threads in situations when total grid size larger than maximum number of real threads? In this way, CUDA need to save contexts of local variables of all real threads, run code for other grid parts till g.sync() by real threads, and then return to saved contexts of first grid parts to run code after g.sync(). So how and where contexts of real threads are saved?

Topic		Replies	Views
CUDA 11 Features Revealed Technical Blog	4	666	October 16, 2024
Wishlist Place your considered suggestions here CUDA Programming and Performance	201	204317	April 13, 2009
CUDA 8 Features Revealed Technical Blog	51	863	November 8, 2018
Cooperative Groups: Flexible CUDA Thread Programming Technical Blog	32	12472	February 7, 2023
CUDA 4.0 CUDA Programming and Performance	63	507399	March 28, 2013
CUDA Toolkit 3.0 beta released now with public downloads CUDA Programming and Performance	104	430099	March 25, 2010
OpenCL or CUDA? CUDA Programming and Performance	16	10961	October 26, 2011
What can't you do in CUDA that you'd like? Requests for the future CUDA Programming and Performance	407	134576	May 26, 2010
An Even Easier Introduction to CUDA Technical Blog	141	6370	November 28, 2023
CUDA very slow performance CUDA Programming and Performance	21	16737	March 6, 2020

CUDA 9 Features Revealed: Volta, Cooperative Groups and More

Related topics