Got a message "Failed to get throughput counters" when I try to collect the throughput of Nvlink

I wanna get the throughput of Nvlink.

So I try this command:

nvlink -gt 0

But I get this error message:

“Failed to get throughput counters”

I also try to use nvml api to do this. But I found that the api nvmlDeviceGetNvLinkUtilizationCounter has this parameter unsigned int counter. How can I determine the value of this parameter?

Or is there any way I can get the throughput of NVLINK?

Any help would be appreciated.

Here is my GPU info:

nvlink --capabilities -i 0:

Link 0, P2P is supported: true
Link 0, Access to system memory supported: true
Link 0, P2P atomics supported: true
Link 0, System memory atomics supported: true
Link 0, SLI is supported: false
Link 0, Link is supported: false

nvlink --status -i 0:

Link 0: 25 GB/s

That isn’t a valid command. There are two issues:

  • this is a subcommand within nvidia-smi, it should be:

    nvidia-smi nvlink ...
  • the argument 0 is not correct. Referring to nvidia-smi command line help, we see that valid arguments are either d or r:

$ nvidia-smi nvlink --help

    nvlink -- Display NvLink information.

    Usage: nvidia-smi nvlink [options]

    Options include:
    [-h | --help]: Display help information
    [-i | --id]: Enumeration index, PCI bus ID or UUID.

    [-l | --link]: Limit a command to a specific link.  Without this flag, all link information is displayed.
    [-s | --status]: Display link state (active/inactive).
    [-c | --capabilities]: Display link capabilities.
    [-p | --pcibusid]: Display remote node PCI bus ID for a link.
    [-R | --remotelinkinfo]: Display remote device PCI bus ID and NvLink ID for a link.
    [-sc | --setcontrol]: Setting counter control is deprecated!
    [-gc | --getcontrol]: Getting counter control is deprecated!
    [-g | --getcounters]: Getting counters using option -g is deprecated.
Please use option -gt/--getthroughput instead.
    [-r | --resetcounters]: Resetting counters is deprecated!
    [-e | --errorcounters]: Display error counters for a link.
    [-ec | --crcerrorcounters]: Display per-lane CRC error counters for a link.
    [-re | --reseterrorcounters]: Reset all error counters to zero.
    [-gt | --getthroughput]: Display link throughput counters for specified counter type
       The arguments consist of character string representing the type of traffic counted:
          d: Display tx and rx data payload in KiB
          r: Display tx and rx data payload and protocol overhead in KiB if supported

When I run a command like that, I get this output:

$ nvidia-smi nvlink -gt d
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-ada9b876-62e3-2837-f48f-e451080db283)
         Link 0: Raw Tx: 49596123 KiB
         Link 0: Raw Rx: 49596040 KiB
         Link 1: Raw Tx: 49596123 KiB
         Link 1: Raw Rx: 49596040 KiB
         Link 2: Raw Tx: 49596081 KiB
         Link 2: Raw Rx: 49595840 KiB
         Link 3: Raw Tx: 49596081 KiB
         Link 3: Raw Rx: 49595840 KiB
         Link 4: Raw Tx: 49596104 KiB
         Link 4: Raw Rx: 49596252 KiB
         Link 5: Raw Tx: 49596104 KiB
         Link 5: Raw Rx: 49596252 KiB
         Link 6: Raw Tx: 49596104 KiB
         Link 6: Raw Rx: 49596252 KiB
         Link 7: Raw Tx: 49596104 KiB
         Link 7: Raw Rx: 49596252 KiB
         Link 8: Raw Tx: 49596123 KiB
         Link 8: Raw Rx: 49596040 KiB
         Link 9: Raw Tx: 49596123 KiB
         Link 9: Raw Rx: 49596040 KiB
         Link 10: Raw Tx: 49596081 KiB
         Link 10: Raw Rx: 49595840 KiB
         Link 11: Raw Tx: 49596081 KiB
         Link 11: Raw Rx: 49595840 KiB
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-0dc422dd-e02d-00ac-7aa3-666d573295e9)
         Link 0: Raw Tx: 49596000 KiB
         Link 0: Raw Rx: 49595840 KiB
         Link 1: Raw Tx: 49596000 KiB
         Link 1: Raw Rx: 49595840 KiB
         Link 2: Raw Tx: 49596000 KiB
         Link 2: Raw Rx: 49595840 KiB
         Link 3: Raw Tx: 49596000 KiB
         Link 3: Raw Rx: 49595840 KiB
         Link 4: Raw Tx: 49596045 KiB
         Link 4: Raw Rx: 49596128 KiB
         Link 5: Raw Tx: 49596045 KiB
         Link 5: Raw Rx: 49596128 KiB
         Link 6: Raw Tx: 49596023 KiB
         Link 6: Raw Rx: 49596253 KiB
         Link 7: Raw Tx: 49596023 KiB
         Link 7: Raw Rx: 49596253 KiB
         Link 8: Raw Tx: 49596045 KiB
         Link 8: Raw Rx: 49596128 KiB
         Link 9: Raw Tx: 49596045 KiB
         Link 9: Raw Rx: 49596128 KiB
         Link 10: Raw Tx: 49596023 KiB
         Link 10: Raw Rx: 49596253 KiB
         Link 11: Raw Tx: 49596023 KiB
         Link 11: Raw Rx: 49596253 KiB
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-e2844c6d-e4fb-d4cb-550a-d043c4c8864d)
         Link 0: Raw Tx: 49596257 KiB
         Link 0: Raw Rx: 49596028 KiB
         Link 1: Raw Tx: 49596257 KiB
         Link 1: Raw Rx: 49596028 KiB
         Link 2: Raw Tx: 49596232 KiB
         Link 2: Raw Rx: 49595841 KiB
         Link 3: Raw Tx: 49596232 KiB
         Link 3: Raw Rx: 49595841 KiB
         Link 4: Raw Tx: 49596261 KiB
         Link 4: Raw Rx: 49596113 KiB
         Link 5: Raw Tx: 49596261 KiB
         Link 5: Raw Rx: 49596113 KiB
         Link 6: Raw Tx: 49596261 KiB
         Link 6: Raw Rx: 49596113 KiB
         Link 7: Raw Tx: 49596261 KiB
         Link 7: Raw Rx: 49596113 KiB
         Link 8: Raw Tx: 49596257 KiB
         Link 8: Raw Rx: 49596028 KiB
         Link 9: Raw Tx: 49596257 KiB
         Link 9: Raw Rx: 49596028 KiB
         Link 10: Raw Tx: 49596232 KiB
         Link 10: Raw Rx: 49595841 KiB
         Link 11: Raw Tx: 49596232 KiB
         Link 11: Raw Rx: 49595841 KiB
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-cad33e39-970d-d3fe-9ab8-ad2b4bcbeb57)
         Link 0: Raw Tx: 49595848 KiB
         Link 0: Raw Rx: 49596008 KiB
         Link 1: Raw Tx: 49595848 KiB
         Link 1: Raw Rx: 49596008 KiB
         Link 2: Raw Tx: 49595848 KiB
         Link 2: Raw Rx: 49596008 KiB
         Link 3: Raw Tx: 49595848 KiB
         Link 3: Raw Rx: 49596008 KiB
         Link 4: Raw Tx: 49595851 KiB
         Link 4: Raw Rx: 49596093 KiB
         Link 5: Raw Tx: 49595851 KiB
         Link 5: Raw Rx: 49596093 KiB
         Link 6: Raw Tx: 49595844 KiB
         Link 6: Raw Rx: 49596235 KiB
         Link 7: Raw Tx: 49595844 KiB
         Link 7: Raw Rx: 49596235 KiB
         Link 8: Raw Tx: 49595851 KiB
         Link 8: Raw Rx: 49596093 KiB
         Link 9: Raw Tx: 49595851 KiB
         Link 9: Raw Rx: 49596093 KiB
         Link 10: Raw Tx: 49595844 KiB
         Link 10: Raw Rx: 49596235 KiB
         Link 11: Raw Tx: 49595844 KiB
         Link 11: Raw Rx: 49596235 KiB

Also note that the availability of the -gt command is dependent on using a “recent” driver. Older drivers use the older -g syntax. Refer to the command line help for the nvidia-smi installed on your system to learn which is applicable, or update your GPU driver.


It works. Thanks for your help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.