- Is there documentation on all the metrics on the “details” page?
- what is “waves per SM”?
- How can I get the number of SMs used to launch a kernel in nsight compute?
- Is there documentation on all the metrics on the “details” page?
For most metrics, you can see their description when querying the respective metric using ns-nsight-cu-cli (https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#command-line-options-profile)
- what is “waves per SM”?
See https://devblogs.nvidia.com/cuda-pro-tip-minimize-the-tail-effect/
- How can I get the number of SMs used to launch a kernel in nsight compute?
Since blocks will be scheduled on different SMs if possible, all SMs will be active if the number of blocks >= number of SMs. Otherwise, it will be the number of blocks. In your application, this might vary due to concurrent kernel execution.