How can I get the exact number of SMs that GPU is using while using nvprof?
And, How can I calculate how many warp a kernel is divided into?
As a result, can I see the sum of the warp number of different kernels in a single GPU context?
I need you to explain it in detail.