CUDA 8 Features Revealed

I really hope that the new unified memory model will make it into consumer Geforce cards.

GTC 2016 slide deck for CUDA 8 and Beyond.

hmm, the cooperative thread groups feature seems tasty. And the slides say this works from Kepler upwards.

In my warp synchronous programming I am hitting boundaries regarding possible matrix sizes. I am limited to 32x32 matrices because I rely on the warp’s implicit synchronization. It seems that CUDA 8 will allow me to go bigger as long as I keep my group sizes limited to the thread block size.


One of my big questions for these features is whether we can get any control of the paging on the new Unified Memory. Due to the nature of the main CUDA project I work on using lots of images (often more than fit on the GPU) I wrote my own ‘paging’ model, but if the system could handle it automatically that would generally be better.

My concerns would be making sure that you don’t end up with a single image striding two pages etc. My perfect scenario would be to be able to define the page size etc but I imagine that will be something at the driver level that we won’t be able to control.

Anyone know if DX12 and Vulkan interop will be included in CUDA 8?