Hi, I’m discovering the rules of blocking scheduler for concurrent streams. There are 2 concurrent streams(stream0, stream1) within 2 kernels(kernel0, kernel1), and kernel0 runs on stream0 while kernel1 runs on stream1. After I tried calling kernels in different orders, the priority to schedule the blocks for kernel seems different.
Observing from the SM id (asm(“mov.u32 %0, %smid;” : “=r”(smid));), when calling in the following order, block scheduler satisfies the kernel1 first by looping SMs in even order and then in odd order.
kernel0 << <1Dimension, XX, 0, stream0>> > ();
kernel1 << <1Dimension, XX, 0, stream1>> > ();
However, if calling order reverses, block scheduler satisfies the kernel0 first. By the way, my env is linux, Tesla V100, compute capacity is 7.0, and CUDA v10.2.
Is there any known rules for the scheduling prority for kernels in concurrent streams? Or it’s out of control? And if it can control, for, example, to make kernel1 satisfied first, how can I code?