CUDA and Kepler GK110 execution hierarchy

Classical CUDA model says that SMX run blocks of threads, and GPU - grids of thread blocks.
But new Kepler features like Hyper-Q - does it means that SMX also may now run of thread blocks grids ?

No, the execution hierarchy is still the same, but GK110 adds two additional features:

  • HyperQ provides multiple kernel queues on the device, which allows the hardware to more effectively schedule kernels from independent CUDA streams.

  • Dynamic Parallelism allows CUDA threads to launch additional kernels. It appears that this feature is implemented by allowing executing blocks to be suspended and replaced with other blocks. The grid of thread blocks for the new kernel is not confined to the SMX that launched it.