What is the relationship between block_idx and sm_idx

Shaquille · September 29, 2023, 1:30pm

Hello, nv experts
I didn’t find reference description for the mapping relationship between block_idx(thread block) and sm_idx(SM) in nvidia public documents。
I want to know what is the mapping formular, is there anyone would like to teach me?

Robert_Crovella · September 29, 2023, 1:55pm

There is no relationship or mapping formula prescribed by CUDA. That is why you can’t find any information about it.

For many typical CUDA programs, the relationship should be irrelevant anyway. You can study nearly all the CUDA sample codes and they don’t need the SM ID for any reason.

Shaquille · September 29, 2023, 2:42pm

I’m very confused it, because I didn’t find any reference descriptions in nvidia public document,
but, I found there must be a fixed mapping formular after I reading this paper:<Stream-K_Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU>, it is published by nvidia, and this idea has been merged in cutlass.
at the same time, it show this figure:

obviously, the author think there is a fixed relationship between CTA and SM, but It still didn’t show the relationship between CTA and SM, I still cannot find this mapping formula in this paper.

So, I’m very confused.

striker159 · September 29, 2023, 3:14pm

Using an atomic counter, one can logically reorder blocks such that blocks which are scheduled first have a smaller logical block id

Robert_Crovella · September 29, 2023, 9:17pm

If you want to do this level of tuning, a threadblock can query its SM ID and organize its work that way. For example if you launching a kernel with enough threadblocks to fill a GPU, then each threadblock can query its SM ID, and then choose to do a portion of the work based on SM ID rather than using built-in threadblock variables.

This, for example, would allow you to arrange work so as to provide benefit from the L1 cache to multiple threadblocks. If you have multiple threadblocks per SM, you still need to sort out which is which perhaps using a per-SM atomic like what striker159 suggested.

You can query the sm id like this.

Here is an example of a code that does SM-specialization, and allows for multiple blocks per SM.

There are a variety of other code examples that do various things based on the SM ID. Here are some examples: 1 2

I haven’t read the paper you indicated.

Shaquille · September 30, 2023, 3:35am

thanks for your kindly explanation
I applied the ptx in my applicatioin： asm("mov.u32 %0, %smid;" : "=r"(ret) );，and I diasm this ptx, the result like this:

        /*0040*/                   S2R R7, SR_VIRTUALSMID ;                   /* 0x0000000000077919 */
                                                                              /* 0x000e620000004300 */
        /*0050*/                   SHF.L.U32 R2, R5, 0x1, RZ ;                /* 0x0000000105027819 */
                                                                              /* 0x001fca00000006ff */
        /*0060*/                   IMAD.WIDE.U32 R2, R2, R3, c[0x0][0x160] ;  /* 0x0000580002027625 */
                                                                              /* 0x000fca00078e0003 */
        /*0070*/                   STG.E [R2.64], R5 ;                        /* 0x0000000502007986 */
                                                                              /* 0x000fe8000c101904 */
        /*0080*/                   STG.E [R2.64+0x4], R7 ;                    /* 0x0000040702007986 */

I noticed that “SR_VIRTUALSMID”, so, I think it is only virtual smid, instead of physical smid, my guess is right?

Topic		Replies	Views
What is the sequence of SM launching? CUDA Programming and Performance	19	1384	March 22, 2024
any way to know on which SM a thread is running? CUDA Programming and Performance	22	12303	July 17, 2017
Block ids and SM ids CUDA Programming and Performance	4	2176	April 15, 2011
Question about threads and SMs CUDA Programming and Performance	1	593	February 4, 2014
Block ids and SM ids CUDA Programming and Performance	0	4583	April 14, 2011
On smid CUDA Programming and Performance	1	1378	February 17, 2012
%smid register returning 0 CUDA Programming and Performance	3	100	August 7, 2025
Identifying SM number of a block CUDA Programming and Performance	1	4227	May 3, 2010
About the relationship between block and stream-multiprocessor CUDA Programming and Performance	2	331	June 20, 2023
visibility what thread contains to what SM CUDA Programming and Performance	5	1864	August 5, 2013

What is the relationship between block_idx and sm_idx

Related topics