How to know the scheduling information about the kernel?

Hi, I am wondering is there any way to know how each thread is scheduled on GPU. For example, in RTX 3070, there are 46 SMs and 4 warp scheduler in each SM. When the kernel runs, each thread block is scheduled to a specific SM and each 32 threads is scheduled to a specific warp scheduler. I want to know which SM and which warp scheduler a thread is scheduled to. Is that possible?


In theory, the warp index I get should be a number like 0/1/2/3 in A100.

You can find out with %smid and %warpid (those are available from PTX, which can be called from inline assembly; there should be ready-to-use functions out there). BTW: It is only possible to know (that is what you asked for), and not possible to control.

1 Like

Thanks, %smid works right, but it seems that %warpid gives the thread warp id in a thread block. The PTX 8.3 document says that

A predefined, read-only special register that returns the thread’s warp identifier. The warp identifier provides a unique warp number within a CTA but not across CTAs within a grid. The warp identifier will be the same for all threads within a single warp.

Also, I write a experimental code


#include <stdio.h>
#include <stdint.h>

static __device__ __inline__ uint32_t __mysmid(){
  uint32_t smid;
  asm volatile("mov.u32 %0, %%smid;" : "=r"(smid));
  return smid;}

static __device__ __inline__ uint32_t __mywarpid(){
  uint32_t warpid;
  asm volatile("mov.u32 %0, %%warpid;" : "=r"(warpid));
  return warpid;}

static __device__ __inline__ uint32_t __mylaneid(){
  uint32_t laneid;
  asm volatile("mov.u32 %0, %%laneid;" : "=r"(laneid));
  return laneid;}


__global__ void mykernel(){

  int idx = threadIdx.x+blockDim.x*blockIdx.x;
  if (idx % 32 == 0)
    printf("I am thread %d, my SM ID is %d, my warp ID is %d, and my warp lane is %d\n", idx, __mysmid(), __mywarpid(), __mylaneid());
}

int main(){

  mykernel<<<1,256>>>();
  cudaDeviceSynchronize();
  return 0;
}

And the result is

I am thread 192, my SM ID is 0, my warp ID is 6, and my warp lane is 0
I am thread 224, my SM ID is 0, my warp ID is 7, and my warp lane is 0
I am thread 128, my SM ID is 0, my warp ID is 4, and my warp lane is 0
I am thread 64, my SM ID is 0, my warp ID is 2, and my warp lane is 0
I am thread 160, my SM ID is 0, my warp ID is 5, and my warp lane is 0
I am thread 0, my SM ID is 0, my warp ID is 0, and my warp lane is 0
I am thread 96, my SM ID is 0, my warp ID is 3, and my warp lane is 0
I am thread 32, my SM ID is 0, my warp ID is 1, and my warp lane is 0

It looks that the warp ID related to the thread sequential in a block, instead of the warp hardware it is scheduled to.

I don’t think it is possible to determine warp scheduler using any method like this. At least, I’m not aware of any.

I would belive, they were identical in your example by accident. You could load two blocks per SM. Then they should be scheduled to different %warpids.

The description of %warpid is

A predefined, read-only special register that returns the thread’s warp identifier. The warp identifier provides a unique warp number within a CTA but not across CTAs within a grid. The warp identifier will be the same for all threads within a single warp.

Note that %warpid is volatile and returns the location of a thread at the moment when read, but its value may change during execution, e.g., due to rescheduling of threads following preemption. For this reason, %ctaid and %tid should be used to compute a virtual warp index if such a value is needed in kernel code; %warpid is intended mainly to enable profiling and diagnostic code to sample and log information such as work place mapping and load distribution.

It is actually refered to “the id of the warp among max resident warps in the SM”. Image that a SM have specific number of slots (in 3070, the number is 48, which is the max resident warps in a SM). A warp is scheduled to a slot, and the slot number is the %warpid. If I load two blocks per SM, they will be scheduled to different %warpid, because warpid in the same SM must be unique.

Let’s construct a example where two blocks are scheduled to the same SM. As 3070 have 46 warps, I set the kernel configuration to <<<47, 32>>>, and I got

I am thread 608, my SM ID is 38, my warp ID is 0, and my warp lane is 0
I am thread 960, my SM ID is 15, my warp ID is 0, and my warp lane is 0
I am thread 640, my SM ID is 40, my warp ID is 0, and my warp lane is 0
I am thread 416, my SM ID is 26, my warp ID is 0, and my warp lane is 0
I am thread 992, my SM ID is 17, my warp ID is 0, and my warp lane is 0
I am thread 1344, my SM ID is 39, my warp ID is 0, and my warp lane is 0
I am thread 448, my SM ID is 28, my warp ID is 0, and my warp lane is 0
I am thread 224, my SM ID is 14, my warp ID is 0, and my warp lane is 0
I am thread 1376, my SM ID is 41, my warp ID is 0, and my warp lane is 0
I am thread 1152, my SM ID is 27, my warp ID is 0, and my warp lane is 0
I am thread 256, my SM ID is 16, my warp ID is 0, and my warp lane is 0
I am thread 1184, my SM ID is 29, my warp ID is 0, and my warp lane is 0
I am thread 672, my SM ID is 42, my warp ID is 0, and my warp lane is 0
I am thread 1408, my SM ID is 43, my warp ID is 0, and my warp lane is 0
I am thread 1024, my SM ID is 19, my warp ID is 0, and my warp lane is 0
I am thread 800, my SM ID is 5, my warp ID is 0, and my warp lane is 0
I am thread 288, my SM ID is 18, my warp ID is 0, and my warp lane is 0
I am thread 704, my SM ID is 44, my warp ID is 0, and my warp lane is 0
I am thread 1440, my SM ID is 45, my warp ID is 0, and my warp lane is 0
I am thread 768, my SM ID is 3, my warp ID is 0, and my warp lane is 0
I am thread 1056, my SM ID is 21, my warp ID is 0, and my warp lane is 0
I am thread 64, my SM ID is 4, my warp ID is 0, and my warp lane is 0
I am thread 576, my SM ID is 36, my warp ID is 0, and my warp lane is 0
I am thread 1312, my SM ID is 37, my warp ID is 0, and my warp lane is 0
I am thread 320, my SM ID is 20, my warp ID is 0, and my warp lane is 0
I am thread 32, my SM ID is 2, my warp ID is 0, and my warp lane is 0
I am thread 928, my SM ID is 13, my warp ID is 0, and my warp lane is 0
I am thread 544, my SM ID is 34, my warp ID is 0, and my warp lane is 0
I am thread 1280, my SM ID is 35, my warp ID is 0, and my warp lane is 0
I am thread 896, my SM ID is 11, my warp ID is 0, and my warp lane is 0
I am thread 192, my SM ID is 12, my warp ID is 0, and my warp lane is 0
I am thread 160, my SM ID is 10, my warp ID is 0, and my warp lane is 0
I am thread 480, my SM ID is 30, my warp ID is 0, and my warp lane is 0
I am thread 1216, my SM ID is 31, my warp ID is 0, and my warp lane is 0
I am thread 512, my SM ID is 32, my warp ID is 0, and my warp lane is 0
I am thread 1248, my SM ID is 33, my warp ID is 0, and my warp lane is 0
I am thread 832, my SM ID is 7, my warp ID is 0, and my warp lane is 0
I am thread 384, my SM ID is 24, my warp ID is 0, and my warp lane is 0
I am thread 1472, my SM ID is 0, my warp ID is 1, and my warp lane is 0
I am thread 0, my SM ID is 0, my warp ID is 0, and my warp lane is 0
I am thread 96, my SM ID is 6, my warp ID is 0, and my warp lane is 0
I am thread 352, my SM ID is 22, my warp ID is 0, and my warp lane is 0
I am thread 1120, my SM ID is 25, my warp ID is 0, and my warp lane is 0
I am thread 1088, my SM ID is 23, my warp ID is 0, and my warp lane is 0
I am thread 864, my SM ID is 9, my warp ID is 0, and my warp lane is 0
I am thread 128, my SM ID is 8, my warp ID is 0, and my warp lane is 0
I am thread 736, my SM ID is 1, my warp ID is 0, and my warp lane is 0

That is too long, we just need to pay attention to

I am thread 1472, my SM ID is 0, my warp ID is 1, and my warp lane is 0
I am thread 0, my SM ID is 0, my warp ID is 0, and my warp lane is 0

Note that each block only have 1 warp and we print 2 different blocks. Two blocks scheduled to the same SM have different warp ID. The warp ID is given by the SM to mark all active warps.

Hi, I have read your post (What is the sequence of SM launching? - #14 by Curefab), and I don’t understand how %warpid is related to hardware SM sub-partition (SMSP) by

int smp = (%warpid & 0xC) >> 2;

Is this information documented somewhere?

It is not. There is no specified or documented relationship between %warpid and SMSP.

1 Like