Tensor core metrics not showing up in NSight?

fiandola · January 6, 2023, 5:27pm

I am trying to find out which layers in my model are using tensor cores, and which are not. I followed the instructions in this post to use NVIDIA NSight Profiler (nsys) on a simple PyTorch model.

main.py:

import torch
import torch.nn as nn
import torchvision.models as models
 
# setup
device = 'cuda:0'
model = models.resnet18().half().to(device)
data = torch.randn(64, 3, 224, 224, device=device).half()
target = torch.randint(0, 1000, (64,), device=device).half()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
 
nb_iters = 20
warmup_iters = 10
for i in range(nb_iters):
   optimizer.zero_grad()
 
   # start profiling after 10 warmup iterations
   if i == warmup_iters: torch.cuda.cudart().cudaProfilerStart()
 
   # push range for current iteration
   if i >= warmup_iters: torch.cuda.nvtx.range_push("iteration{}".format(i))
 
   # push range for forward
   if i >= warmup_iters: torch.cuda.nvtx.range_push("forward")
   output = model(data)
   if i >= warmup_iters: torch.cuda.nvtx.range_pop()
 
   # pop iteration range
   if i >= warmup_iters: torch.cuda.nvtx.range_pop()
 
torch.cuda.cudart().cudaProfilerStop()

Here is the command I used to run NSight:
nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas -s cpu --capture-range=cudaProfilerApi --stop-on-range-end=true --cudabacktrace=true -x true -o my_profile python main.py

It produces a profile file, which I opened in the NSight viewer. Below is what I see in the viewer.

I am trying to figure out how to see which layers are using tensor cores. I clicked on every menu I could find, but I haven’t yet figured out how to do this. Any advice on how to see which layers are using tensor cores?

One other thing: In this youtube video on NSight, there is “GPU Metrics” section. This is missing from my viewer.

System details:

Driver version: 515
NSight version (nsys –version): NVIDIA Nsight Systems version 2021.3.2.12-9700a21
NSight viewer version: Version: 2022.1.3.3-1c7b5f7 Linux.
GPU: NVIDIA Titan RTX (similar to V100)

fiandola · January 6, 2023, 5:59pm

It occurred to me that maybe the problem is that I didn’t use the --gpu-metrics-set flag.

To figure out the right value of the flag, I looked at…

$ nsys profile --gpu-metrics-set=help

Possible --gpu-metrics-set values are:
        [0] [tu10x]        General Metrics for NVIDIA TU10x (any frequency)
        [1] [tu11x]        General Metrics for NVIDIA TU11x (any frequency)
        [2] [ga100]        General Metrics for NVIDIA GA100 (any frequency)
        [3] [ga10x]        General Metrics for NVIDIA GA10x (any frequency)
        [4] [tu10x-gfxt]   Graphics Throughput Metrics for NVIDIA TU10x (frequency >= 10kHz)
        [5] [ga10x-gfxt]   Graphics Throughput Metrics for NVIDIA GA10x (frequency >= 10kHz)
        [6] [ga10x-gfxact] Graphics Async Compute Triage Metrics for NVIDIA GA10x (frequency >= 10kHz)

My Titan RTX is a TU102 (aka tu10x) GPU, so I think 0 is the right value.

So, I tried adding --gpu-metrics-set 0 to my command. Unfortunately, this didn’t add any new information to the NSight viewer window.

I’m still stuck on the problem that I described in the original post.

fiandola · January 6, 2023, 8:18pm

I think I should be using NSight Compute (ncu) instead of NSight Systems (nsys) to collect these metrics. I’m trying that.

felix_dt · January 9, 2023, 1:56pm

Nsight Compute will give you tensor core (or rather tensor pipeline) utilization metrics on a per-kernel or per-range level, but not with time-correlated granularity, i.e. how values change over the runtime of your CUDA kernel. Which tool you want to use depends on your use case and needs.

hwilper · January 9, 2023, 6:01pm

Your version of Nsight Systems is very old, I would start with updating that. I think you’ll need that to really get gpu-metrics correctly.

Here’s a hint from the documentation that will come out in our next version that will help.

Note: Tensor Core: If you run nsys profile --gpu-metrics-device all, the Tensor Core utilization can be found in the GUI under the SM instructions/Tensor Active row.

Please note that it is not practical to expect a CUDA kernel to reach 100% Tensor Core utilization since there are other overheads. In general, the more computation-intensive an operation is, the higher Tensor Core utilization rate the CUDA kernel can achieve.

fiandola · January 9, 2023, 8:27pm

Excellent - thank you!!!

I updated to the latest nsys and nsys-ui (version 2022.5), and now these things show up in the plot!

qwerty00 · May 17, 2024, 7:23am

Hi, I am doing profiling on A100 with nsys. But It shows that

$ nsys profile --gpu-metrics-set=ga10x-gfxact ./test
Illegal --gpu-metrics-set argument: ga10x-gfxact.
Metric set is not supported by GPU 0.
Use the '--gpu-metrics-set=help' switch to see the full list of values.

usage: nsys profile [<args>] [application] [<application args>]
Try 'nsys profile --help' for more information.

The available GPU profile metrics are

~$ nsys profile --gpu-metrics-set=help
Possible --gpu-metrics-set values are:
[0] [tu10x]        General Metrics for NVIDIA TU10x (any frequency)
[1] [tu11x]        General Metrics for NVIDIA TU11x (any frequency)
[2] [ga100]        General Metrics for NVIDIA GA100 (any frequency)
[3] [ga10x]        General Metrics for NVIDIA GA10x (any frequency)
[4] [gh100]        General Metrics for NVIDIA GH100 (any frequency)
[5] [ad10x]        General Metrics for NVIDIA AD10x (any frequency)
[6] [tu10x-gfxt]   Graphics Throughput Metrics for NVIDIA TU10x (frequency >= 10kHz)
[7] [ga10x-gfxt]   Graphics Throughput Metrics for NVIDIA GA10x (frequency >= 10kHz)
[8] [ga10x-gfxact] Graphics Async Compute Triage Metrics for NVIDIA GA10x (frequency >= 10kHz)
[9] [ga10b]        General Metrics for NVIDIA GA10B (any frequency)

My environment is

GPU:
65:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 40GB] (rev a1)
ca:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 40GB] (rev a1)

Driver Version:
Fri May 17 15:22:25 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  On   | 00000000:65:00.0 Off |                    0 |
| N/A   34C    P0    34W / 250W |      2MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCI...  On   | 00000000:CA:00.0 Off |                    0 |
| N/A   63C    P0   188W / 250W |  25774MiB / 40960MiB |     75%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    1   N/A  N/A     60047      C   ...envs/python310/bin/python    25772MiB |
+-----------------------------------------------------------------------------+

CUDA Version:
nvcc: NVIDIA (R) Cuda compiler driver
      Copyright (c) 2005-2022 NVIDIA Corporation
      Built on Mon_Oct_24_19:12:58_PDT_2022
      Cuda compilation tools, release 12.0, V12.0.76
      Build cuda_12.0.r12.0/compiler.31968024_0

Nsys Version: NVIDIA Nsight Systems version 2022.4.2.18-32044700v0

Why I can not use the ga10x-gfxact metric?

hwilper · May 17, 2024, 3:16pm

@pkovalenko can you help with this.

pkovalenko · May 17, 2024, 4:49pm

GA100 is not GA10x. GA10x denotes consumer desktop Ampere chips: GA102, GA104, etc. Which metrics do you need that are not available in ga100 metric set?

qwerty00 · May 18, 2024, 6:26am

Hi, I tried to use ga100 metric set to profile my program:

nsys profile  --gpu-metrics-device=0 --gpu-metrics-set=ga100 ./test

After profiling I downloaded the .nsys-rep file and open it in Windows NVIDIA Nsight System GUI, it looks as

The metrics are not detailed for example some metrics such as FMA throughput is not displayed.

I do the same profiling in my RTX 3070 by

nsys profile  --gpu-metrics-device=0 --gpu-metrics-set=ga10x-gfxact ./test

and the metrics include SM Instruction throughtputs and many other details,

I want to show these metrics on A100 GPU, what should I do?

Topic		Replies	Views
How can I get the utilization of cuda core and tensor core respectively? Profiling Linux Targets	5	3217	January 10, 2023
Cannot get tensor core metrics with latest NSight system Profiling Linux Targets cuda , profiling	4	1439	June 20, 2023
[QuadDCommon::tag_message*] = No GPU associated to the given UUID Profiling Linux Targets	24	994	November 5, 2024
Can't Get NCU GUI To Import Properly Nsight Compute	8	1352	October 5, 2020
Profiling and Optimizing Deep Neural Networks with DLProf and PyProf Technical Blog	13	1414	August 11, 2021
How can I prevent my customized CUDA kernel function from using tensor cores on a Jetson Orin device? Jetson AGX Orin cuda , kernel	19	1020	February 5, 2024
How to measure Tensor core utilization using NVIDIA profiling tools such as Nsight System, DLProf, nvprof etc TensorRT cudnn	4	1653	January 31, 2024
Nsys command line on agx pegasus Profiling DRIVE Targets drive-devtools	13	1886	November 16, 2021
Nsight Profile of NVIDIA/CUDALibrarySamples/cuTENSOR. Does it use CUDA Programming and Performance	4	523	November 22, 2022
Cannot profile RTX 2060 KO (TU104) with CUDA 11.0 on windows and ubuntu Visual Profiler and nvprof nvbugs	8	2759	July 27, 2020

Tensor core metrics not showing up in NSight?

Related topics