What happens if the kernel's blocks can not fill the 4 SMs to 100% ?

chicagotyper · November 18, 2014, 6:32am

Hi,I posted a question last Saturday,but no one answer me,I think maybe it’s not the right day and the right question,so I ask again.

My card is GTX 750 Ti:

Can thread blocks from different kernels(in different streams) run concurrently on one SMM?or they can only run
concurrently on different SMMs?

What happens if the kernel’s blocks can not fill the 4 SMs to 100% ?Can the GPU make use of the rest resource for the next kernel in the same stream or in another stream?

little_jimmy · November 18, 2014, 10:08am

“Can thread blocks from different kernels(in different streams) run concurrently on one SMM?”

generally, yes; it of course depends on the specifics and requirements of each block

the programming guide lists the maximum number of kernel launches that a device can execute concurrently, and the maximum number of blocks per sm

“What happens if the kernel’s blocks can not fill the 4 SMs to 100%”

point 1:

100% utility does not necessarily occur only at 100% occupancy

point 2:

if you have multiple streams, and if the device finds that it can, it is likely going to seat multiple kernels’ blocks on the same sm, or sms
again, this may imply greater occupancy, and may or may not imply greater utility

but, i am merely a cuda peasant; perhaps a cuda elite can confirm

chicagotyper · November 18, 2014, 11:51am

thank you very much for your answer!