Hello!
I used the nsight compute cli to check the performance metrics of the two kernels, and the results are as follows:
Section: GPU Speed Of Light Throughput
----------------------- ------------- ------------
Metric Name Metric Unit Metric Value
----------------------- ------------- ------------
DRAM Frequency cycle/nsecond 9.57
SM Frequency cycle/nsecond 2.04
Elapsed Cycles cycle 120,274
Memory Throughput % 7.64
DRAM Throughput % 3.15
Duration usecond 58.56
L1/TEX Cache Throughput % 8.32
L2 Cache Throughput % 3.07
SM Active Cycles cycle 109,837.59
Compute (SM) Throughput % 29.10
----------------------- ------------- ------------
Section: GPU Speed Of Light Throughput
----------------------- ------------- ------------
Metric Name Metric Unit Metric Value
----------------------- ------------- ------------
DRAM Frequency cycle/nsecond 9.91
SM Frequency cycle/nsecond 2.14
Elapsed Cycles cycle 227,509
Memory Throughput % 95.23
DRAM Throughput % 95.23
Duration usecond 106.14
L1/TEX Cache Throughput % 18.47
L2 Cache Throughput % 41.26
SM Active Cycles cycle 222,054.82
Compute (SM) Throughput % 9.38
----------------------- ------------- ------------
My question is: why does kernel with low Compute (SM) Throughput has higher SM Active Cycles? What exactly does SM Active Cycles mean?
In addition, I didn’t find a detailed description of these metrics in the documentation, please let me know if they exist, thanks!