getting SM information

  1. Is there documentation on all the metrics on the “details” page?
  2. what is “waves per SM”?
  3. How can I get the number of SMs used to launch a kernel in nsight compute?
  1. Is there documentation on all the metrics on the “details” page?
    For most metrics, you can see their description when querying the respective metric using ns-nsight-cu-cli (https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#command-line-options-profile)
  1. what is “waves per SM”?
    See https://devblogs.nvidia.com/cuda-pro-tip-minimize-the-tail-effect/
  1. How can I get the number of SMs used to launch a kernel in nsight compute?
    Since blocks will be scheduled on different SMs if possible, all SMs will be active if the number of blocks >= number of SMs. Otherwise, it will be the number of blocks. In your application, this might vary due to concurrent kernel execution.