Got a message "Failed to get throughput counters" when I try to collect the throughput of Nvlink

I wanna get the throughput of Nvlink.

So I try this command:

nvlink -gt 0

But I get this error message:

“Failed to get throughput counters”

I also try to use nvml api to do this. But I found that the api nvmlDeviceGetNvLinkUtilizationCounter has this parameter unsigned int counter. How can I determine the value of this parameter?

Or is there any way I can get the throughput of NVLINK?

Any help would be appreciated.

Here is my GPU info:

nvlink --capabilities -i 0:

GPU 0: NVIDIA A100-SXM4-80GB
Link 0, P2P is supported: true
Link 0, Access to system memory supported: true
Link 0, P2P atomics supported: true
Link 0, System memory atomics supported: true
Link 0, SLI is supported: false
Link 0, Link is supported: false

nvlink --status -i 0:

GPU 0: NVIDIA A100-SXM4-80GB
Link 0: 25 GB/s

That isn’t a valid command. There are two issues:

  • this is a subcommand within nvidia-smi, it should be:

    nvidia-smi nvlink ...
    
  • the argument 0 is not correct. Referring to nvidia-smi command line help, we see that valid arguments are either d or r:

$ nvidia-smi nvlink --help

    nvlink -- Display NvLink information.

    Usage: nvidia-smi nvlink [options]

    Options include:
    [-h | --help]: Display help information
    [-i | --id]: Enumeration index, PCI bus ID or UUID.

    [-l | --link]: Limit a command to a specific link.  Without this flag, all link information is displayed.
    [-s | --status]: Display link state (active/inactive).
    [-c | --capabilities]: Display link capabilities.
    [-p | --pcibusid]: Display remote node PCI bus ID for a link.
    [-R | --remotelinkinfo]: Display remote device PCI bus ID and NvLink ID for a link.
    [-sc | --setcontrol]: Setting counter control is deprecated!
    [-gc | --getcontrol]: Getting counter control is deprecated!
    [-g | --getcounters]: Getting counters using option -g is deprecated.
Please use option -gt/--getthroughput instead.
    [-r | --resetcounters]: Resetting counters is deprecated!
    [-e | --errorcounters]: Display error counters for a link.
    [-ec | --crcerrorcounters]: Display per-lane CRC error counters for a link.
    [-re | --reseterrorcounters]: Reset all error counters to zero.
    [-gt | --getthroughput]: Display link throughput counters for specified counter type
       The arguments consist of character string representing the type of traffic counted:
          d: Display tx and rx data payload in KiB
          r: Display tx and rx data payload and protocol overhead in KiB if supported

When I run a command like that, I get this output:

$ nvidia-smi nvlink -gt d
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-ada9b876-62e3-2837-f48f-e451080db283)
         Link 0: Raw Tx: 49596123 KiB
         Link 0: Raw Rx: 49596040 KiB
         Link 1: Raw Tx: 49596123 KiB
         Link 1: Raw Rx: 49596040 KiB
         Link 2: Raw Tx: 49596081 KiB
         Link 2: Raw Rx: 49595840 KiB
         Link 3: Raw Tx: 49596081 KiB
         Link 3: Raw Rx: 49595840 KiB
         Link 4: Raw Tx: 49596104 KiB
         Link 4: Raw Rx: 49596252 KiB
         Link 5: Raw Tx: 49596104 KiB
         Link 5: Raw Rx: 49596252 KiB
         Link 6: Raw Tx: 49596104 KiB
         Link 6: Raw Rx: 49596252 KiB
         Link 7: Raw Tx: 49596104 KiB
         Link 7: Raw Rx: 49596252 KiB
         Link 8: Raw Tx: 49596123 KiB
         Link 8: Raw Rx: 49596040 KiB
         Link 9: Raw Tx: 49596123 KiB
         Link 9: Raw Rx: 49596040 KiB
         Link 10: Raw Tx: 49596081 KiB
         Link 10: Raw Rx: 49595840 KiB
         Link 11: Raw Tx: 49596081 KiB
         Link 11: Raw Rx: 49595840 KiB
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-0dc422dd-e02d-00ac-7aa3-666d573295e9)
         Link 0: Raw Tx: 49596000 KiB
         Link 0: Raw Rx: 49595840 KiB
         Link 1: Raw Tx: 49596000 KiB
         Link 1: Raw Rx: 49595840 KiB
         Link 2: Raw Tx: 49596000 KiB
         Link 2: Raw Rx: 49595840 KiB
         Link 3: Raw Tx: 49596000 KiB
         Link 3: Raw Rx: 49595840 KiB
         Link 4: Raw Tx: 49596045 KiB
         Link 4: Raw Rx: 49596128 KiB
         Link 5: Raw Tx: 49596045 KiB
         Link 5: Raw Rx: 49596128 KiB
         Link 6: Raw Tx: 49596023 KiB
         Link 6: Raw Rx: 49596253 KiB
         Link 7: Raw Tx: 49596023 KiB
         Link 7: Raw Rx: 49596253 KiB
         Link 8: Raw Tx: 49596045 KiB
         Link 8: Raw Rx: 49596128 KiB
         Link 9: Raw Tx: 49596045 KiB
         Link 9: Raw Rx: 49596128 KiB
         Link 10: Raw Tx: 49596023 KiB
         Link 10: Raw Rx: 49596253 KiB
         Link 11: Raw Tx: 49596023 KiB
         Link 11: Raw Rx: 49596253 KiB
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-e2844c6d-e4fb-d4cb-550a-d043c4c8864d)
         Link 0: Raw Tx: 49596257 KiB
         Link 0: Raw Rx: 49596028 KiB
         Link 1: Raw Tx: 49596257 KiB
         Link 1: Raw Rx: 49596028 KiB
         Link 2: Raw Tx: 49596232 KiB
         Link 2: Raw Rx: 49595841 KiB
         Link 3: Raw Tx: 49596232 KiB
         Link 3: Raw Rx: 49595841 KiB
         Link 4: Raw Tx: 49596261 KiB
         Link 4: Raw Rx: 49596113 KiB
         Link 5: Raw Tx: 49596261 KiB
         Link 5: Raw Rx: 49596113 KiB
         Link 6: Raw Tx: 49596261 KiB
         Link 6: Raw Rx: 49596113 KiB
         Link 7: Raw Tx: 49596261 KiB
         Link 7: Raw Rx: 49596113 KiB
         Link 8: Raw Tx: 49596257 KiB
         Link 8: Raw Rx: 49596028 KiB
         Link 9: Raw Tx: 49596257 KiB
         Link 9: Raw Rx: 49596028 KiB
         Link 10: Raw Tx: 49596232 KiB
         Link 10: Raw Rx: 49595841 KiB
         Link 11: Raw Tx: 49596232 KiB
         Link 11: Raw Rx: 49595841 KiB
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-cad33e39-970d-d3fe-9ab8-ad2b4bcbeb57)
         Link 0: Raw Tx: 49595848 KiB
         Link 0: Raw Rx: 49596008 KiB
         Link 1: Raw Tx: 49595848 KiB
         Link 1: Raw Rx: 49596008 KiB
         Link 2: Raw Tx: 49595848 KiB
         Link 2: Raw Rx: 49596008 KiB
         Link 3: Raw Tx: 49595848 KiB
         Link 3: Raw Rx: 49596008 KiB
         Link 4: Raw Tx: 49595851 KiB
         Link 4: Raw Rx: 49596093 KiB
         Link 5: Raw Tx: 49595851 KiB
         Link 5: Raw Rx: 49596093 KiB
         Link 6: Raw Tx: 49595844 KiB
         Link 6: Raw Rx: 49596235 KiB
         Link 7: Raw Tx: 49595844 KiB
         Link 7: Raw Rx: 49596235 KiB
         Link 8: Raw Tx: 49595851 KiB
         Link 8: Raw Rx: 49596093 KiB
         Link 9: Raw Tx: 49595851 KiB
         Link 9: Raw Rx: 49596093 KiB
         Link 10: Raw Tx: 49595844 KiB
         Link 10: Raw Rx: 49596235 KiB
         Link 11: Raw Tx: 49595844 KiB
         Link 11: Raw Rx: 49596235 KiB

Also note that the availability of the -gt command is dependent on using a “recent” driver. Older drivers use the older -g syntax. Refer to the command line help for the nvidia-smi installed on your system to learn which is applicable, or update your GPU driver.

2 Likes

It works. Thanks for your help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.