NSight Systems gpu-metrics support for A10 GPU?

I encountered an error message that says “not supported” when checking the GPU metrics device on an NVIDIA A10 GPU:

root [ /my-files ]# /opt/nvidia/nsight-systems/2023.1.1/bin/nsys profile --gpu-metrics-device=help
Possible --gpu-metrics-device values are:
        0: NVIDIA A10-24Q PCI[7dca:00:00.0] (not supported)

Is there a plan to support GPU metrics on the A10 GPU?
Or is A10 supported, but my setup is incorrect for some reason?
My setup works with A100 and T4.

I am surprised that this isn’t working for you.

What driver are you using?

@Andrey_Trachenko, who set up the supported metrics?

I think it’s just an Azure internal setup issue. Even nvidia-smi and nvidia-smi pmon do not show any info from the GPU such as the load:

Mon Mar 20 18:49:02 2023
| NVIDIA-SMI 510.73.08    Driver Version: 510.73.08    CUDA Version: 11.6     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A10-24Q      On   | 00008F62:00:00.0 Off |                    0 |
| N/A   N/A    P8    N/A /  N/A |    127MiB / 24512MiB |     N/A      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|    0   N/A  N/A      1367      G   /usr/lib/xorg/Xorg                103MiB |
|    0   N/A  N/A      1919      G   /usr/bin/gnome-shell               22MiB |

I can successfully run NSight and obtain profiles, so apparently this issue pertains to some specific part of GPU metrics.

I had other priorities and have not followed up with Azure support yet.

Thank you

As I understand, Azure VMs might be running on top of Hyper-V, and this is currently not a supported configuration to collect GPU metrics. I believe the issue relates to the permissions model, where device-wide collection is not possible on a multi-tenant host.

Maybe. However, I can successfully get GPU metrics and nvidia-smi pmon output from Azure VMs running T4s and A100s. Only A10s for some reason does not support it. :(


I’m not sure what is the software stack on the Azure VMs. Maybe you could ask their support if GPU HWPM profiling capabilities are enabled on the A10 systems that you use. Sorry for not being more helpful at the moment.