CUDA 9 Features Revealed: Volta, Cooperative Groups and More

anon95180265 · August 10, 2017, 6:50pm

Great question. The answer is it is not possible to synchronize all threads when the grid is larger than the maximum number of threads for full occupancy. Therefore, you must determine a grid size that will fit using the CUDA occupancy API and then launch the grid using cudaLaunchCooperativeKernel() (or cudaLaunchCooperativeKernelMultiDevice()), which returns an error if your grid size fails an occupancy check.

anon12983301 · August 15, 2017, 2:46pm

I am looking forward to more examples of cooperative thread groups. Hopefully there will be examples of how to deal with any different generation of card issues. I still have Kepler K40s, but would like to have understanding of differences so can run on Pascal or Volta cards once I can upgrade($$$). Some good examples of how to take advantage of Tensor cores directly would also be interesting. Possible dumb question: I have some sparse matrix projects that use cusparse,etc. I wonder if they would benefit having these tensor cores or it those things really prefer non-sparse?

anon32302329 · September 26, 2017, 6:10pm

LU factorization in GPUs?

anon28553881 · October 27, 2017, 3:21pm

The ballot function is applicable to a Cooperative Thread Group?

anon95180265 · November 2, 2017, 8:21pm

Yes. See https://docs.nvidia.com/cud...

You can create a thread_block_group of power of two size less than 32 and it exposes a .ballot() method (as well as shfl routines, any/all etc.)

anon95180265 · November 2, 2017, 8:26pm

http://docs.nvidia.com/cuda...

Topic		Replies	Views
CUDA 11 Features Revealed Technical Blog	4	666	October 16, 2024
Wishlist Place your considered suggestions here CUDA Programming and Performance	201	204317	April 13, 2009
CUDA 8 Features Revealed Technical Blog	51	863	November 8, 2018
Cooperative Groups: Flexible CUDA Thread Programming Technical Blog	32	12472	February 7, 2023
CUDA 4.0 CUDA Programming and Performance	63	507400	March 28, 2013
CUDA Toolkit 3.0 beta released now with public downloads CUDA Programming and Performance	104	430099	March 25, 2010
OpenCL or CUDA? CUDA Programming and Performance	16	10961	October 26, 2011
What can't you do in CUDA that you'd like? Requests for the future CUDA Programming and Performance	407	134576	May 26, 2010
An Even Easier Introduction to CUDA Technical Blog	141	6372	November 28, 2023
CUDA very slow performance CUDA Programming and Performance	21	16738	March 6, 2020

CUDA 9 Features Revealed: Volta, Cooperative Groups and More

Related topics