How do i get some of the nvprof metrics in insight?

I have pretty old cuda book released around 2014 which focuses on Fermi and Kepler named “Professional Cuda C programming”. Lot of examples mention about nvprof but with my system (rtx2070) with compute capability 7.5, nvprof no longer appears to be supported.
For example, tried branch efficienty metric:
nvprof --metrics branch_efficiency ./a.out 256 33554432
======== Warning: Skipping profiling on device 0 since profiling is not supported on devices with compute capability 7.5 and higher.
Use NVIDIA Nsight Compute for GPU profiling and NVIDIA Nsight Systems for GPU tracing and CPU sampling.
Refer NVIDIA Developer Tools Overview | NVIDIA Developer for more details.

Now I installed the nsight and tried command line vesrion for similar metrics but does not appear to be finding anything. Any ideas?

root@nonroot-MS-7B22:/git.co/dev-learn/gpu/cuda/linux/cuda-c-programming# nv-nsight-cu-cli --list-metrics | grep -i branch
root@nonroot-MS-7B22:/git.co/dev-learn/gpu/cuda/linux/cuda-c-programming# nv-nsight-cu-cli --list-metrics
sm__warps_active.avg.per_cycle_active
sm__warps_active.avg.pct_of_peak_sustained_active
sm__throughput.avg.pct_of_peak_sustained_elapsed
sm__maximum_warps_per_active_cycle_pct
sm__maximum_warps_avg_per_active_cycle
sm__cycles_active.avg
lts__throughput.avg.pct_of_peak_sustained_elapsed
launch__waves_per_multiprocessor
launch__thread_count
launch__shared_mem_per_block_static
launch__shared_mem_per_block_dynamic
launch__shared_mem_per_block_driver
launch__shared_mem_per_block
launch__shared_mem_config_size
launch__registers_per_thread
launch__occupancy_per_shared_mem_size
launch__occupancy_per_register_count
launch__occupancy_per_block_size
launch__occupancy_limit_warps
launch__occupancy_limit_shared_mem
launch__occupancy_limit_registers
launch__occupancy_limit_blocks
launch__grid_size
launch__func_cache_config
launch__block_size
l1tex__throughput.avg.pct_of_peak_sustained_active
gpu__time_duration.sum
gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed
-arch:75:86:gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed
-arch:40:70:gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed
gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed
gpc__cycles_elapsed.max
gpc__cycles_elapsed.avg.per_second
dram__cycles_elapsed.avg.per_second
-arch:75:86:dram__cycles_elapsed.avg.per_second
-arch:40:70:dram__cycles_elapsed.avg.per_second
breakdown:sm__throughput.avg.pct_of_peak_sustained_elapsed
breakdown:gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed
root@nonroot-MS-7B22:/git.co/dev-learn/gpu/cuda/linux/cuda-c-programming#

I can get print-summary output but it outputs far more than necessary and not finding the specific one metric I was looking for, mentioned above:

==PROF== Connected to process 24102 (/git.co/dev-learn/gpu/cuda/linux/cuda-c-programming/a.out)
/git.co/dev-learn/gpu/cuda/linux/cuda-c-programming/./a.out using Device 0: NVIDIA GeForce RTX 2070 SUPER.
Data size 64.
Execution configure (block 64 grid 1).
==PROF== Profiling "warmingUp(float*)" - 1: 0%....50%....100% - 8 passes
warmup <<< 1 64 >>> elapsed 000000 sec.
==PROF== Profiling "mathKernel1(float*)" - 2: 0%....50%....100% - 8 passes
mathKernel1 <<<    1   64 >>> elapsed 000001 sec.
==PROF== Profiling "mathKernel2(float*)" - 3: 0%....50%....100% - 8 passes
mathKernel2 <<<    1   64 >>> elapsed 000000 sec.
==PROF== Disconnected from process 24102
[24102] a.out@127.0.0.1
  Device 0
    mathKernel1(float*), Block Size 64, Grid Size 1, 1 invocations
      Section: GPU Speed Of Light
      Metric Name                                                      Metric Unit   Minimum     Maximum     Average
      ---------------------------------------------------------------- ------------- ----------- ----------- -----------
      dram__cycles_elapsed.avg.per_second                              cycle/nsecond 6.468085    6.468085    6.468085
      gpc__cycles_elapsed.avg.per_second                               cycle/nsecond 1.502992    1.502992    1.502992
      gpc__cycles_elapsed.max                                          cycle         2265.000000 2265.000000 2265.000000
      gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed %             0.836062    0.836062    0.836062
      gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed           %             0.025699    0.025699    0.025699
      gpu__time_duration.sum                                           usecond       1.504000    1.504000    1.504000
      l1tex__throughput.avg.pct_of_peak_sustained_active               %             22.429907   22.429907   22.429907
      lts__throughput.avg.pct_of_peak_sustained_elapsed                %             0.836062    0.836062    0.836062
      sm__cycles_active.avg                                            cycle         18.725000   18.725000   18.725000
      sm__throughput.avg.pct_of_peak_sustained_elapsed                 %             0.009399    0.009399    0.009399

    mathKernel1(float*), Block Size 64, Grid Size 1, 1 invocations
      Section: Launch Statistics
      Metric Name                          Metric Unit     Minimum   Maximum   Average
      ------------------------------------ --------------- --------- --------- ---------
      launch__block_size                                   64.000000 64.000000 64.000000
      launch__grid_size                                    1.000000  1.000000  1.000000
      launch__registers_per_thread         register/thread 16.000000 16.000000 16.000000
      launch__shared_mem_config_size       Kbyte           32.768000 32.768000 32.768000
      launch__shared_mem_per_block_driver  byte/block      0.000000  0.000000  0.000000
      launch__shared_mem_per_block_dynamic byte/block      0.000000  0.000000  0.000000
      launch__shared_mem_per_block_static  byte/block      0.000000  0.000000  0.000000
      launch__thread_count                 thread          64.000000 64.000000 64.000000
      launch__waves_per_multiprocessor                     0.001563  0.001563  0.001563

    mathKernel1(float*), Block Size 64, Grid Size 1, 1 invocations
      Section: Occupancy
      Metric Name                                       Metric Unit Minimum    Maximum    Average
      ------------------------------------------------- ----------- ---------- ---------- ----------
      launch__occupancy_limit_blocks                    block       16.000000  16.000000  16.000000
      launch__occupancy_limit_registers                 block       64.000000  64.000000  64.000000
      launch__occupancy_limit_shared_mem                block       16.000000  16.000000  16.000000
      launch__occupancy_limit_warps                     block       16.000000  16.000000  16.000000
      sm__maximum_warps_avg_per_active_cycle            warp        32.000000  32.000000  32.000000
      sm__maximum_warps_per_active_cycle_pct            %           100.000000 100.000000 100.000000
      sm__warps_active.avg.pct_of_peak_sustained_active %           6.229139   6.229139   6.229139
      sm__warps_active.avg.per_cycle_active             warp        1.993324   1.993324   1.993324

    mathKernel2(float*), Block Size 64, Grid Size 1, 1 invocations
      Section: GPU Speed Of Light
      Metric Name                                                      Metric Unit   Minimum     Maximum     Average
      ---------------------------------------------------------------- ------------- ----------- ----------- -----------
      dram__cycles_elapsed.avg.per_second                              cycle/nsecond 6.204082    6.204082    6.204082
      gpc__cycles_elapsed.avg.per_second                               cycle/nsecond 1.495536    1.495536    1.495536
      gpc__cycles_elapsed.max                                          cycle         2350.000000 2350.000000 2350.000000
      gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed %             0.861875    0.861875    0.861875
      gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed           %             0.077097    0.077097    0.077097
      gpu__time_duration.sum                                           usecond       1.568000    1.568000    1.568000
      l1tex__throughput.avg.pct_of_peak_sustained_active               %             20.095694   20.095694   20.095694
      lts__throughput.avg.pct_of_peak_sustained_elapsed                %             0.861875    0.861875    0.861875
      sm__cycles_active.avg                                            cycle         20.900000   20.900000   20.900000
      sm__throughput.avg.pct_of_peak_sustained_elapsed                 %             0.018654    0.018654    0.018654

    mathKernel2(float*), Block Size 64, Grid Size 1, 1 invocations
      Section: Launch Statistics
      Metric Name                          Metric Unit     Minimum   Maximum   Average
      ------------------------------------ --------------- --------- --------- ---------
      launch__block_size                                   64.000000 64.000000 64.000000
      launch__grid_size                                    1.000000  1.000000  1.000000
      launch__registers_per_thread         register/thread 16.000000 16.000000 16.000000
      launch__shared_mem_config_size       Kbyte           32.768000 32.768000 32.768000
      launch__shared_mem_per_block_driver  byte/block      0.000000  0.000000  0.000000
      launch__shared_mem_per_block_dynamic byte/block      0.000000  0.000000  0.000000
      launch__shared_mem_per_block_static  byte/block      0.000000  0.000000  0.000000
      launch__thread_count                 thread          64.000000 64.000000 64.000000
      launch__waves_per_multiprocessor                     0.001563  0.001563  0.001563

    mathKernel2(float*), Block Size 64, Grid Size 1, 1 invocations
      Section: Occupancy
      Metric Name                                       Metric Unit Minimum    Maximum    Average
      ------------------------------------------------- ----------- ---------- ---------- ----------
      launch__occupancy_limit_blocks                    block       16.000000  16.000000  16.000000
      launch__occupancy_limit_registers                 block       64.000000  64.000000  64.000000
      launch__occupancy_limit_shared_mem                block       16.000000  16.000000  16.000000
      launch__occupancy_limit_warps                     block       16.000000  16.000000  16.000000
      sm__maximum_warps_avg_per_active_cycle            warp        32.000000  32.000000  32.000000
      sm__maximum_warps_per_active_cycle_pct            %           100.000000 100.000000 100.000000
      sm__warps_active.avg.pct_of_peak_sustained_active %           6.231310   6.231310   6.231310
      sm__warps_active.avg.per_cycle_active             warp        1.994019   1.994019   1.994019

    warmingUp(float*), Block Size 64, Grid Size 1, 1 invocations
      Section: GPU Speed Of Light
      Metric Name                                                      Metric Unit   Minimum     Maximum     Average
      ---------------------------------------------------------------- ------------- ----------- ----------- -----------
      dram__cycles_elapsed.avg.per_second                              cycle/nsecond 6.080000    6.080000    6.080000
      gpc__cycles_elapsed.avg.per_second                               cycle/nsecond 1.465521    1.465521    1.465521
      gpc__cycles_elapsed.max                                          cycle         2349.000000 2349.000000 2349.000000
      gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed %             0.862263    0.862263    0.862263
      gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed           %             0.313528    0.313528    0.313528
      gpu__time_duration.sum                                           usecond       1.600000    1.600000    1.600000
      l1tex__throughput.avg.pct_of_peak_sustained_active               %             20.216606   20.216606   20.216606
      lts__throughput.avg.pct_of_peak_sustained_elapsed                %             0.862263    0.862263    0.862263
      sm__cycles_active.avg                                            cycle         20.775000   20.775000   20.775000
      sm__throughput.avg.pct_of_peak_sustained_elapsed                 %             0.018657    0.018657    0.018657

    warmingUp(float*), Block Size 64, Grid Size 1, 1 invocations
      Section: Launch Statistics
      Metric Name                          Metric Unit     Minimum   Maximum   Average
      ------------------------------------ --------------- --------- --------- ---------
      launch__block_size                                   64.000000 64.000000 64.000000
      launch__grid_size                                    1.000000  1.000000  1.000000
      launch__registers_per_thread         register/thread 16.000000 16.000000 16.000000
      launch__shared_mem_config_size       Kbyte           32.768000 32.768000 32.768000
      launch__shared_mem_per_block_driver  byte/block      0.000000  0.000000  0.000000
      launch__shared_mem_per_block_dynamic byte/block      0.000000  0.000000  0.000000
      launch__shared_mem_per_block_static  byte/block      0.000000  0.000000  0.000000
      launch__thread_count                 thread          64.000000 64.000000 64.000000
      launch__waves_per_multiprocessor                     0.001563  0.001563  0.001563

    warmingUp(float*), Block Size 64, Grid Size 1, 1 invocations
      Section: Occupancy
      Metric Name                                       Metric Unit Minimum    Maximum    Average
      ------------------------------------------------- ----------- ---------- ---------- ----------
      launch__occupancy_limit_blocks                    block       16.000000  16.000000  16.000000
      launch__occupancy_limit_registers                 block       64.000000  64.000000  64.000000
      launch__occupancy_limit_shared_mem                block       16.000000  16.000000  16.000000
      launch__occupancy_limit_warps                     block       16.000000  16.000000  16.000000
      sm__maximum_warps_avg_per_active_cycle            warp        32.000000  32.000000  32.000000
      sm__maximum_warps_per_active_cycle_pct            %           100.000000 100.000000 100.000000
      sm__warps_active.avg.pct_of_peak_sustained_active %           6.231197   6.231197   6.231197
      sm__warps_active.avg.per_cycle_active             warp        1.993983   1.993983   1.993983

Note: The shown averages are calculated as the arithmetic mean of the metric values after the evaluation of the metrics for each individual kernel launch.
If aggregating across varying launch configurations (like shared memory, cache config settings), the arithmetic mean can be misleading and looking at the individual results is recommended instead.
This output mode is backwards compatible to the per-kernel summary output of nvprof
root@nonroot-MS-7B22:/git.co/dev-learn/gpu/cuda/linux/cuda-c-programming#