In Fermi, it says there can be 16 streaming concurrent kernels possible.
What will happen, if we launch more than 16 (say 100) concurrent kernels as streams and resources are not enough? Will GPU perform all the kernels anyway without concurrency or will it discard (not perform) the excess streams?
What is the basic entity which execute a streaming kernel? (is it a streaming multiprocessor?)
When a streaming kernel is executed, is it fully utilizing a streaming multiprocessor (SM)?
If the number of streams running concurrently is less than the number of streaming multiprocessors (SMs), will concurrent stream instructions be scheduled on different SMs?