Threads, warps, and blocks

Hi everyone !
I’m not very experienced with CUDA (only c++) and I have a evaluation program to run.

I was asked to add a code that returns and report the number of blocks per SM and warps per block and finally the number of threads.

Is this possible ? if so How (if you may, please)? I thought those may only determined by reading the code.
And is it possible why doing graphic related works ?

This looks like homework/classwork so I’m not going to give you an exact answer. But for some of the things you want to determine, CUDA provides special variables that can be queried and may be useful.

Here’s a simple code that may give you some ideas:

#include <stdio.h>
#include <stdint.h>

static __device__ __inline__ uint32_t __mysmid(){
  uint32_t smid;
  asm volatile("mov.u32 %0, %%smid;" : "=r"(smid));
  return smid;}

static __device__ __inline__ uint32_t __mywarpid(){
  uint32_t warpid;
  asm volatile("mov.u32 %0, %%warpid;" : "=r"(warpid));
  return warpid;}

static __device__ __inline__ uint32_t __mylaneid(){
  uint32_t laneid;
  asm volatile("mov.u32 %0, %%laneid;" : "=r"(laneid));
  return laneid;}


__global__ void mykernel(){

  int idx = threadIdx.x+blockDim.x*blockIdx.x;
  printf("I am thread %d, my SM ID is %d, my warp ID is %d, and my warp lane is %d\n", idx, __mysmid(), __mywarpid(), __mylaneid());
}

int main(){

  mykernel<<<4,4>>>();
  cudaDeviceSynchronize();
  return 0;
}

You may also want to think about using atomics. You can use an atomicAdd of a global variable (+1 per thread) to count how many threads actually executed, in total.