How to measure the performance of NVLINK while I running HPL

bartmann · July 19, 2021, 8:51am

Hello,

I’m now using (evaluating) DGX-A100 with NVIDIA HPC-Benchmarks Container.
While I run HPL (using multiple GPUs), I want to know about the performance (bandwith) of nvlink.

I tried “nvidia-smi nvlink -gt d -i 0” (optionally plus -i device#), but it does not show any change in throughput counters before and after running HPL.

before HPL run ------------------------
2021. 07. 19. (mon) 17:45:00 KST
GPU 0: A100-SXM-80GB (UUID: )
Link 0: Data Tx: 991870746 KiB
Link 0: Data Rx: 994428615 KiB
Link 1: Data Tx: 991828209 KiB
Link 1: Data Rx: 994386931 KiB
…
Link 11: Data Tx: 990447250 KiB
Link 11: Data Rx: 993019367 KiB

after HPL run ------------------------
2021. 07. 19. (mon) 17:46:15 KST
GPU 0: A100-SXM-80GB (UUID: )
Link 0: Data Tx: 991870746 KiB
Link 0: Data Rx: 994428615 KiB
Link 1: Data Tx: 991828209 KiB
Link 1: Data Rx: 994386931 KiB
…
Link 11: Data Tx: 990447250 KiB
Link 11: Data Rx: 993019367 KiB

---- HPL run result in docker ---------------------------------
2021-07-19 08:46:10.996
T/V N NB P Q Time Gflops
WRxxxxxx 5xxxxx 2xx x x2 9.96 1.306e+04
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.xxxxxxxx … PASSED

===============
For short evaluation, the HPL runs approx. 10s using 8x A100 in DGX-A100.

The “nvidia-smi nvlink -gt d -i 0” command was exeuted outside of docker.

For comparison, I also tried the “p2pBandwidthLatencyTest” in sample code.
After running “p2pBandwidthLatencyTest”, the nvlink perf. counter changed.

Before “p2pBandwidthLatencyTest” --------------------------------
GPU 1: A100-SXM-80GB (UUID: )
Link 0: Data Tx: 992685174 KiB
Link 0: Data Rx: 992327236 KiB
Link 1: Data Tx: 992642624 KiB
Link 1: Data Rx: 992284697 KiB
…
Link 11: Data Tx: 991270582 KiB
Link 11: Data Rx: 990903740 KiB
After “p2pBandwidthLatencyTest” --------------------------------
GPU 1: A100-SXM-80GB (UUID: )
Link 0: Data Tx: 993596626 KiB
Link 0: Data Rx: 993238688 KiB
Link 1: Data Tx: 993554076 KiB
Link 1: Data Rx: 993196150 KiB
…
Link 11: Data Tx: 992182035 KiB
Link 11: Data Rx: 991815192 KiB

The run results of “p2pBandwidthLatencyTest” shows that the nvlink worked.
But I don’t know why the nvlink perf. counter value of HPL running did not changed.
(I think the HPL may use nvlink for better performance)

Here is my questions:
How can I check the throughtput of nvlink using NVIDIA HPL doceker?
Does my approach right for measuring the performance of nvlink?
Or, does the HPL (in container) utilize nvlink or not?

Topic		Replies	Views
DGX-1 NVlink Tx,Rx Throughput issues CUDA Programming and Performance	1	1053	August 7, 2024
Got a message "Failed to get throughput counters" when I try to collect the throughput of Nvlink CUDA Programming and Performance nvidia-smi	3	1417	November 16, 2022
HPL test using NVLINK CUDA Setup and Installation cuda , hpc	1	1324	March 11, 2023
How to balance nvlink CUDA Programming and Performance	8	688	April 27, 2024
NVLink and Quadro RTX 5000 Linux ubuntu	3	1511	February 22, 2022
HPC Container HPL-21.4 MPI_Recv error Container: HPC	5	2280	March 24, 2022
Nvidia docker nvcr.io/nvidia/hpc-benchmarks:23.10 HPL running error at HPC ARM Developer-kit Container: HPC cuda	2	1163	February 22, 2024
[HPC-Benchmarks] Discrepancy between A100 PCIe and A100 SMX4 NGC GPU Cloud cuda	2	1713	January 27, 2022
Run HPL on 4x A100 CUDA Programming and Performance	3	3008	July 17, 2021
Interpretation of "total aggregate bandwidth" for HGX A100 CUDA Programming and Performance a100	9	2561	June 3, 2024

How to measure the performance of NVLINK while I running HPL

Related topics