What is IPC, the explanation in the documentation is Instructions execution per cycle, but what is the instructions here, does cycle refer to the cycle of SM, can IPC represent the running time of a program, why my 1080ti MAX IPC is 6 ?

The CUPTI/nvprof metric ipc is defined as

ipc = inst_executed / active_cycles

inst_executed is the number of warp instructions (not threads) retired by an SM.

active_cycles is the number of SM cycles the SM had at least one active warp.

elapsed_cycles is the number of SM cycles during the PM collection period.

The ipc metric is best used as a throughput metric to determine if the application is compute bound.

On Maxwell and Pascal architecture the SM has 4 sub-partitions. Each sub-partition has a warp scheduler. Each warp scheduler can issue 2 instructions per cycle. The maximum theoretical ipc value per cycle is 8. However, the instruction fetch rate and the maximum rate of variable latency instructions (load, store, transcendentals, …) limits the sustained rate to 6 instructions per cycle.

ipc cannot be used to represent running time. ipc is the instruction throughput over a period of collection. A FP32 bound kernel can achieve close to 4 IPC per SM. A memory bound kernel is more likely to have < 1 IPC per SM.

CUPTI and nvprof are moving to using a new version of PerfWorks metrics. The PerfWorks metrics (available in the new Nsight Compute profiler and Nsight VSE CUDA profiler) are

PerfWorks Metrics (< v1, Kepler-Volta)

sm__inst_executed_{avg, max, min, sum}

sm__inst_executed_{avg, sum}*per*{active, elapsed}*cycle
sm__inst_executed_per*{active, elapsed}_cycle_sol_pct

PerfWorks Metrics (> v1, Nsight 6.0 Turing)

sm__inst_executed.{avg, max, min, sum}

sm__inst_executed.{avg, max, min, sum}.per_cycle_{active, elapsed}

sm__inst_executed.{avg, max, min, sum}.pct_peak_sustained_per_{active, elapsed}

I don`t know how to @ somebody but thanks a lot @(Greg@NV)