Elapsed cycles reproted by nsight

According to the following stat (SOL run)

Elapsed Cycles                      cycle                               9,006
SM Frequency                       cycle/nsecond                 1.15
Memory [%]                           %                                   17.52
Duration                         usecond                         7.84

The device is TitanV with 1.45GHz core clock frequency according to the deviceQuery.
With the duration of 7.84us, the number of cycles should be 1.45GHz * 7.84us = 11310 cycles.

But the reported number of cycles is 9006. The calculation is pretty straightforward. I wonder what is the reason of such difference. As you can see the SM frequency is said to be 1.15GHz which I also wonder about that.

For the elapsed cycles calculation, you need to use the SM frequency reported by the tool, not the one reported by deviceQuery.

For many metrics, their value is directly influenced by the current GPU SM and memory clock frequencies. For example, if a kernel instance is profiled that has prior kernel executions in the application, the GPU might already be in a higher clocked state and the measured kernel duration, along with other metrics, will be affected. Likewise, if a kernel instance is the first kernel to be launched in the application, GPU clocks will regularly be lower. In addition, due to kernel replay, the metric value might depend on which replay pass it is collected in, as later passes would result in higher clock states.

To mitigate this non-determinism, NVIDIA Nsight Compute attempts to limit GPU clock frequencies to their base value. As a result, metric values are less impacted by the location of the kernel in the application, or by the number of the specific replay pass.

However, this behavior might be undesirable for analysis of the kernel, e.g. in cases where an external tool is used to fix clock frequencies, or where the behavior of the kernel within the application is analyzed. To solve this, users can adjust the --clock-control option to specify if any clock frequencies should be fixed by the tool.

Note that thermal throttling directed by the driver cannot be controlled by the tool and always overrides any selected options.

You can check https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#command-line-options-profile for more details on these options.

1 Like

Thank you.