Can't get GPU Metrics with nsight-system

lylyly6666 · February 15, 2023, 7:06am

when I run the program:

python3 main_tcgnn.py --dataset citeseer --dim 3703 --hidden 16 --classes 6 --num_layers 2 --model gcn

it can work flawlessly, below is the result:

however, when I use nsight-system to profile the program, the command like :

sudo /usr/local/cuda/nsight-system/bin/nsys profile --stats=true --gpu-metrics-device=0 --gpu-metrics-frequency=10000 python3 main_tcgnn.py --dataset citeseer --dim 3703 --hidden 16 --classes 6 --num_layers 2 --model gcn

the programe also run well, but I can’t get the GPU Metrics in the generated report， here:

and the report generated with some error, the detail:

enviroment:
NVIDIA Nsight Systems version 2022.5.1.82-32078057v0
RTX 2080 Ti(11GB) * 1

so how can I solve the problem to get the right GPU Metrics like tensor core activity?

hwilper · February 15, 2023, 8:46pm

What driver do you have?

Can you send me the results from running “nsys status -e” at the command line?

lylyly6666 · February 16, 2023, 1:05am

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.76       Driver Version: 515.76       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:40:00.0 Off |                  N/A |
| 31%   29C    P8    34W / 250W |      0MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

root@autodl-container-7b5011b452-56963392:~/ly-zjlab/TCGNN# /usr/local/cuda/nsight-system/bin/nsys status -e
Timestamp counter supported: Yes

CPU Profiling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 3
Linux Distribution = Ubuntu
Linux Kernel Version = 5.4.0-126-generic: OK
Linux perf_event_open syscall available: Fail
Sampling trigger event available: Fail
Intel(c) Last Branch Record support: Not Available
CPU Profiling Environment (process-tree): Fail
CPU Profiling Environment (system-wide): Fail

hwilper · February 16, 2023, 5:04pm

@pkovalenko , can you please chime in on this?

pkovalenko · February 16, 2023, 6:15pm

It looks like this is being run from a docker container, right? The page referenced in the diagnostic message (https://developer.nvidia.com/ERR_NVGPUCTRPERM) describes the right steps to fix the issue. Specifically, --cap-add=SYS_ADMIN has to be added to docker run arguments.

995201814 · May 29, 2023, 1:21pm

HI did u solve this problem, i met the same problem when using autodl gpu machine

lylyly6666 · June 14, 2023, 7:27am

maybe there’s some problem with the autodl, after we turned to A100, such problem doesn’t appear again.

wulinpeng9805 · June 14, 2024, 7:37am

work when I run “docker run xxx --cap-add=SYS_ADMIN”

Topic		Replies	Views
Cannot get tensor core metrics with latest NSight system Profiling Linux Targets cuda , profiling	4	1408	June 20, 2023
Can't get GPU Metrics with Nsight System Profiling Linux Targets cuda	13	100	September 6, 2024
[Nsights system] GPU metric not supported on RTX 3090 Ti Profiling Linux Targets cuda , nsight	1	637	January 10, 2024
Error Collecting Nsys Profile Metrics Profiling Linux Targets nsight	3	544	April 18, 2024
Profiling Python code using sudo Profiling Linux Targets nsight , python , profiling	8	2120	March 10, 2022
Issue with GeForce RTX 3060: Unable to Enable GPU Metrics in Nsight Systems GUI Profiling Linux Targets	4	700	December 14, 2023
Nsight-system can't recognize the conda enviroment when profile the application Profiling Linux Targets cuda	4	1124	March 2, 2023
NIC metric data was not collected Profiling Linux Targets	8	1180	January 25, 2022
Nsys profile in Deepstream container Profiling Linux Targets nsight , deepstream	9	1510	September 10, 2022
Unable to get gpu metrics on Quadro GV100 Profiling Linux Targets	3	467	January 5, 2024

Can't get GPU Metrics with nsight-system

Related topics