Metrics meaning in Nsight compute

I want to understand some metrics in Nsight compute occupy calculator.

I just create one block with 12 threads (my_kernel<<<1 , 12>>>) But when I see “active thread per multiprocessor” , I see the number of active thread is 255! What is the difference between these two terminology? Is the same as the difference between the hardware thread and software thread?

Hi, @lzxdjb

There are different things.

The 12 thread is from your app. The Active Threads per Multiprocessor(256) is calculate by 32*Active Warps per Multiprocessor(8).

You can launch 256 threads per Multiprocessor but you didn’t launch that much in your application.