In the metric list, I see a section for NVLink. According to the descriptions, they are mostly device properties. Is there any metric for showing runtime properties? For example, number of packets on in the network, average/max/min delays, and so on?
You’re correct that the NVLink topology metrics in that document are mostly about device properties. There are runtime metrics for NVlink with the prefixes nvlrx__ and nvltx__. You can find the list of those in the Metrics Details window as long as you have a report open from a GPU with those metrics. Or you can use the “–query-metrics” flag from the CLI.