Collecting busy SM IDs

mahmood.nt · March 11, 2024, 10:50am

Hi
In the following code, I would like to collect the SM IDs that have executed the kernel. The kernel is a simple addition one and I have written this code:

__device__ uint get_smid(void) {
     uint ret;
     asm("mov.u32 %0, %smid;" : "=r"(ret) );
     return ret;
}
__global__ void simpleAdd(float *v, int n, vector<int> &smVector)
{
  int i = blockIdx.x * blockDim.x + threadIdx.x;
  if (i < n) {
    int sm = get_smid();
    smVector.push_back(sm);
    v[i] = v[i] + 1;
  }
}
...
int main()
{
  ...
  simpleAdd<<<numBlocks, blockSize>>>(deviceVector, n, smVector);
  ...
}

But the error is that calling a host function from the device kernel is not allowed. I also tried this code to access vector elements by [] operator instead of push_back but get the same error.

__device__ uint get_smid(void) {
     uint ret;
     asm("mov.u32 %0, %smid;" : "=r"(ret) );
     return ret;
}
__global__ void simpleAdd(float *v, int n, vector<int> &smVector)
{
  int i = blockIdx.x * blockDim.x + threadIdx.x;
  if (i < n) {
    int sm = get_smid();
    smVector[sm]++;
    v[i] = v[i] + 1;
  }
}
...
int main()
{
  ...
  smVector.resize(68);
  simpleAdd<<<numBlocks, blockSize>>>(deviceVector, n, smVector);
  ...
}

Any idea on how to achieve that?

striker159 · March 11, 2024, 12:24pm

If this is std vector, it cannot work. If it is your own vector, declare the functions as __host__ __device__.

Also note that %smid is bounded by %nsmid, and

The SM identifier numbering is not guaranteed to be contiguous, so %nsmid may be larger than the physical number of SMs in the device.

So you would need to use %nsmid as buffer size

mahmood.nt · March 12, 2024, 10:09am

Apart from the original question, which I still have that, using %nsmid gives different result than %smid. For example, for a short array on device, the %nsmid shows SM_68 while %smid shows SM_0.

striker159 · March 12, 2024, 11:10am

That is no surprise, is it? nsmid is simply an upper bound for smid. they are not equivalent.

This code will print a histogram of used sm ids.

#include <iostream>
#include <map>

__device__ 
int get_smid(void) {
    int ret;
    asm("mov.u32 %0, %smid;" : "=r"(ret) );
    return ret;
}

__global__ 
void kernel(int* smidPerBlock){
    if(threadIdx.x == 0){
        smidPerBlock[blockIdx.x] = get_smid();
    }
}

int main(){
    int numBlocks = 4096;
    int* smidPerBlock; cudaMallocManaged(&smidPerBlock, sizeof(int) * numBlocks);
    kernel<<<numBlocks, 128>>>(smidPerBlock);
    cudaDeviceSynchronize();

    std::map<int,int> histogram;
    for(int i = 0; i <numBlocks; i++){
        histogram[smidPerBlock[i]]++;
    }
    for(const auto& pair : histogram){
        std::cout << pair.first << " " << pair.second << "\n";
    }
}

mahmood.nt · March 12, 2024, 12:59pm

OK Thank you very much.
I also found an answer to the first question with an example in this page

gist.github.com

https://gist.github.com/allanmac/4751080

smid.cu

#include <stdio.h>

//
//
//

#define DEVICE_INTRINSIC_QUALIFIERS   __device__ __forceinline__

DEVICE_INTRINSIC_QUALIFIERS
unsigned int

This file has been truncated. show original

system · March 26, 2024, 12:59pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
any way to know on which SM a thread is running? CUDA Programming and Performance	22	12307	July 17, 2017
%smid register returning 0 CUDA Programming and Performance	3	100	August 7, 2025
How to get SM number? CUDA Programming and Performance	1	1065	January 4, 2011
visibility what thread contains to what SM CUDA Programming and Performance	5	1864	August 5, 2013
Question about the assignment of SMS through Green Context CUDA Programming and Performance cuda	5	112	October 24, 2025
Block ids and SM ids CUDA Programming and Performance	0	4583	April 14, 2011
Identifying SM number of a block CUDA Programming and Performance	1	4227	May 3, 2010
Question about threads and SMs CUDA Programming and Performance	1	593	February 4, 2014
Is it possible to allocate the SMs to kernel or kernelet CUDA Programming and Performance	3	583	July 30, 2018
Global Timing and Kernels CUDA Programming and Performance	9	1544	July 17, 2013

Collecting busy SM IDs

Related topics