Concurrent Kernel Execution and Context switching Problem

Hello, i have some questions about CKE on Fermi

According to Fermi architecture,

  1. At most 8 blocks on 1 SM
  2. At most 16 concurrent kernels on 1 GPU
  3. Kernels only run in parallel once the first kernel does not occupy all SMs anymore

I hope my assumption is not wrong, then here are my questions:

  1. My GPU is GTS-450 which has 4 SMs.
    If i write 2 kernels in different streams, first kernel with 8 blocks and second kernel with 4 blocks.
    I assume that first kernel doesn’t occupy all the resource, maybe 4 blocks fill 1 SM, so the second kernel can launch on device concurrently because there are resouces on GPU.

    So my question is how the blocks be issued to each SM?
    Situation 1: first kernel block1~4 on SM1 block5~8 on SM2, second kernel on SM3 and SM4
    Situation 2: like RR scheduling, first kernel block1,5 on SM1 block2,6 on SM2 block3,7 on SM3, block4,8 on SM4 , and second kernel block1 on SM1 block2 on SM2 …etc.

    Or is not the cases above.

  2. This question is about context switching on GPU.
    In Fermi white paper(page 18), it said that like CPUs, GPUs support multitasking through the use of context switching, where each program receives a time slice of the processor’s resources.
    Now i have 2 kernels in different streams, but first kernel has 1024 blocks so it can occupy all SMs easily. After a time slice, will the first kernel context switch and turn to execute the second kernel ( kernel-level context switch )?
    Or the first kernel will context switch all the blocks and context switch to the block of second kernel even if blocks in first kernel is not completed ( block-level context switch )?

  3. This question is about CKE on GPU
    Still 2 concurrent kernels execution on GPU, if first kernel has 8 blocks and second has 16 blocks, and now first kernel 1~8 and second kernel 1~4 are executed on SMs now.
    If some blocks of first kernel is completed, will it issue the block of second kernel immediately? Or just wait for all blocks on SMs are completed?

The other condition is if the first kernel occupies the SMs and if some blocks of first kernel is completed, there’re some freed resource to issue blocks of second kernel. Will it issue blocks of second kernel immediately?

Thank you all.

  1. Assignment of blocks to SMs is undefined, and it’s not really something you can predict.

  2. There’s no context switching in that manner. Once a kernel starts executing, only the blocks from that kernel are dispatched until all of the blocks from that kernel have been dispatched.

  3. It will start issuing the second kernel immediately, assuming you are able to get overlap.

Dear tmurray, 

I understand!!! Thanks a lot~!