Hello, i have some questions about CKE on Fermi
According to Fermi architecture,
- At most 8 blocks on 1 SM
- At most 16 concurrent kernels on 1 GPU
- Kernels only run in parallel once the first kernel does not occupy all SMs anymore
I hope my assumption is not wrong, then here are my questions:
My GPU is GTS-450 which has 4 SMs.
If i write 2 kernels in different streams, first kernel with 8 blocks and second kernel with 4 blocks.
I assume that first kernel doesn’t occupy all the resource, maybe 4 blocks fill 1 SM, so the second kernel can launch on device concurrently because there are resouces on GPU.
So my question is how the blocks be issued to each SM?
Situation 1: first kernel block1~4 on SM1 block5~8 on SM2, second kernel on SM3 and SM4
Situation 2: like RR scheduling, first kernel block1,5 on SM1 block2,6 on SM2 block3,7 on SM3, block4,8 on SM4 , and second kernel block1 on SM1 block2 on SM2 …etc.
Or is not the cases above.
This question is about context switching on GPU.
In Fermi white paper(page 18), it said that like CPUs, GPUs support multitasking through the use of context switching, where each program receives a time slice of the processor’s resources.
Now i have 2 kernels in different streams, but first kernel has 1024 blocks so it can occupy all SMs easily. After a time slice, will the first kernel context switch and turn to execute the second kernel ( kernel-level context switch )?
Or the first kernel will context switch all the blocks and context switch to the block of second kernel even if blocks in first kernel is not completed ( block-level context switch )?
This question is about CKE on GPU
Still 2 concurrent kernels execution on GPU, if first kernel has 8 blocks and second has 16 blocks, and now first kernel 1~8 and second kernel 1~4 are executed on SMs now.
If some blocks of first kernel is completed, will it issue the block of second kernel immediately? Or just wait for all blocks on SMs are completed?
The other condition is if the first kernel occupies the SMs and if some blocks of first kernel is completed, there’re some freed resource to issue blocks of second kernel. Will it issue blocks of second kernel immediately?
Thank you all.