Tesla V100 doesn't support GPU-Metrics Collection

Hello,
I am trying to profile some jobs but my V100 is not able to collect gpu-metrics.
I didn’t find any doc about which GPUs are supported.
Can you help with this?

V100

$ nsys profile --gpu-metrics-device=help
Possible --gpu-metrics-device values are:

Some GPUs are not supported:
	Tesla V100-SXM2-32GB PCI[0000:62:00.0]
$ nvidia-smi 
Wed Aug  7 20:02:19 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100-SXM2-32GB           Off |   00000000:62:00.0 Off |                    0 |
| N/A   35C    P0             53W /  300W |       0MiB /  32768MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
$ nsys status -e
Timestamp counter supported: Yes

CPU Profiling Environment Check
Root privilege: disabled
Linux Kernel Paranoid Level = 1
Linux Distribution = CentOS
Linux Kernel Version = 3.10.0-1127.19.1.el7.x86_64: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): Fail

I am using a controlled environment so I don’t have sudo access.
Anyway, my Tesla-T4 is fine with GPU metrics.

$ nvidia-smi
Wed Aug  7 20:58:45 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:81:00.0 Off |                    0 |
| N/A   50C    P0             28W /   70W |       0MiB /  15360MiB |      5%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
$ nsys profile --gpu-metrics-device=help
Possible --gpu-metrics-device values are:
	0: Tesla T4 PCI[0000:81:00.0]
	all: Select all supported GPUs
	none: Disable GPU Metrics [Default]

NSYS does not support sampling GPU metrics on GV100. Turing is the first architecture supported.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.