Fermi allows us to concurrently run multiple kernels. Unified shader architecture also allows us to run different shaders (fragment/geometry/vertex) at the same time. Hence, I think in a sense they have similarities. To me, it looks like NVIDIA has disclosed this feature, which became available way back in the unified shader architecture, to the programmers with the Fermi architecture. If this assumption is right, I was wondering which kept NVIDIA from disclosing this feature in the architectures which was released earlier than Fermi? What was the bottleneck, and how was it solved?
I am also wondering about the scheduling details of the blocks on these architectures.
I think there is only one scheduling queue for the blocks launched from different kernels and the scheduler issues these blocks in the order they were received. Also, I think scheduler keep sending these blocks to SMs as long as there are free resources. Assuming that there are no dependencies between the kernels (e.g. in Fermi each kernel is launched from a different stream) I was wondering if blocks from different kernels can run together on the same SM at the same time.
For instance, lets assume kernel1 is launched before kernel2 and number of blocks of kernel1 is not enough to fill the machine. In that case, is it possible that one SM may run the blocks of kernel1 (ending blocks of kernel1) and kernel2 (starting blocks of kernel2) at the same time?
Or this is not possible and blocks from different kernels can not reside on the same SM at the same time? Maybe the scheduler only allows blocks from the same kernel/shader to run on one SM at one time?