NSIGHT equivalent for branch_efficiency in nvprof

In nvprof, the metric used to determine branch efficiency (warp divergence) is branch_efficiency. nvprof has been deprecated for devices with compute capability >= 7.2 and NSIGHT Compute is used instead. What is the equivalent to branch_efficiency in NSIGHT?

Thanks.

As indicated in the nvprof transition guide https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#nvprof-metric-comparison, branch_efficieny is not directly available in Nsight Compute at this point. The team is looking into providing a matching mapping in a future release.

In the meantime, please check if any of the following related metrics is useful for your case:

smsp__average_warp_latency_issue_stalled_branch_resolving
average number of warp cycles spent waiting for a branch target address to be computed, and the warp PC to be updated

smsp__average_warps_issue_stalled_branch_resolving_per_issue_active
average number of warps resident per issue cycle, waiting for a branch target address to be computed, and the warp PC to be updated

smsp__inst_executed_op_branch
number of warp instructions executed: BRA, BRX, JMP, JMX, CALL, RET here description needs to include YIELD, EXIT, WARPSYNC … etc instruction in description

smsp__warp_issue_stalled_branch_resolving_per_warp_active
proportion of warps per cycle, waiting for a branch target address to be computed, and the warp PC to be updated

smsp__warps_issue_stalled_branch_resolving
cumulative number of warps waiting for a branch target address to be computed, and the warp PC to be updated

Note that you might need to add a valid suffix to the base name when collecting the metric in Nsight Compute

.2 and NSIGHT Compute is used instead. What is the equivalent to branch_efficiency in NSIGHT? >Nox Vidmate VLC