It seems that stall metrics are at smsp level only. Is that right? How can I do measure that at SM level?
For example, IPC is measurable at smsp and sm levels. That means, I have to analyze IPC and stall metrics at smsp level. Is that right? Since smsp level is 1/4 of an SM and a device has a number of SMs, e.g. 68, I wonder how much smsp analysis is comprehensive. Any thought on that?
Instructions are issued and warps are scheduled at the SMSP (warp scheduler) level so calculating at the SMSP level is the most accurate method of analysis. You can calculate the average value at the SM level but it will in general provide you no more useful information.
“I wonder how much smsp analysis is comprehensive. Any thought on that?”
I’m not clear what you are asking. In almost all cases of Nsight Compute you are looking at the average value across N instances of a unit. In what case do you think calculating these metrics as the average across SMs would provide more information?
The one case where it could be useful to compare SMSP to SM would be if you thought you had a common tail effect that was leaving each SM with only 1 active SMSP.