questions about warp scheduling

walkershaw · April 16, 2016, 12:26pm

Hi all,
For maxwell streaming multiprocessor, the warp scheduler picks one or two instructions from a ready warp to issue cycle by cycle. My questions are:

what would make a wrap stall? L2 cache miss, function calls or anything else.
is it preemptive or not? I mean if nothing makes the warp stall, will the scheduler let it run till the end and then issue instructions of other warps?
there are four schedulers in every maxwell SM. I wonder is there any way to find out which warps are scheduled by the same scheduler? For example, I got 8 warps from 0 to 7, so there should be 2 warps scheduled by each scheduler. I wanna know which warp is scheduled with warp 0.
Looking forward to any reply, thanks.

scottgray · April 17, 2016, 12:04am

Every instruction has either a fixed or variable latency. If some other instruction is dependent on the result of another then that instruction will stall until the required number of clocks transpire. Other warps with ready instructions will execute during these stalls. More on that here:
https://github.com/NervanaSystems/maxas/wiki/Control-Codes

Even if a warp has no stalls the scheduler will periodically let other warps run. I haven’t really investigated this behavior in much detail, but you can assume that what warp gets scheduled at any given cycle is somewhat random. This paper talks a little bit about warp scheduling schemes if you’re insterested (4.1 Scheduling: Barrel vs. Switch-on-Stall):
http://ieeexplore.ieee.org.sci-hub.io/xpl/login.jsp?tp=&arnumber=7095803&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D7095803#

I’m pretty sure the special register %warpid maps to the scheduler in use:

unsigned warpid;
asm("mov.u32 %0, %warpid;" : "=r"(warpid):);
unsigned scheduler_id = warpid & 3;

walkershaw · April 17, 2016, 7:23am

Thanks a lot. Just one more question. As you indicated, schedulers distribute warps in a round-robin fashion.How do you know this, is there any document or something else? Thanks again.

Every instruction has either a fixed or variable latency. If some other instruction is dependent on the result of another then that instruction will stall until the required number of clocks transpire. Other warps with ready instructions will execute during these stalls. More on that here:
https://github.com/NervanaSystems/maxas/wiki/Control-Codes

Even if a warp has no stalls the scheduler will periodically let other warps run. I haven’t really investigated this behavior in much detail, but you can assume that what warp gets scheduled at any given cycle is somewhat random. This paper talks a little bit about warp scheduling schemes if you’re insterested (4.1 Scheduling: Barrel vs. Switch-on-Stall):
http://ieeexplore.ieee.org.sci-hub.io/xpl/login.jsp?tp=&arnumber=7095803&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D7095803#

I’m pretty sure the special register %warpid maps to the scheduler in use:
unsigned warpid;
asm("mov.u32 %0, %warpid;" : "=r"(warpid):);
unsigned scheduler_id = warpid & 3;

scottgray · April 17, 2016, 7:37pm

Partly common sense but mostly from lots of experience with writing micro benchmarks.

walkershaw · April 18, 2016, 5:14am

Thanks man :)

LongY · December 5, 2016, 6:42am

This link [url]https://github.com/hyqneuron/asfermi/wiki/S2R_Test[/url] may provide some insight on how warp scheduler works for Fermi architecture.