GPU Peer to Peer communication bandwidth Test result is confused (the furthest card is the best}

shunkang1997 · March 25, 2020, 6:04am

I use the Nvidia official sample code to test the peer to peer bandwidth one my server and the result is as the following. There is no NVLINK for my server.

Unidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3      4      5      6      7      8 
     0 258.57   9.79   9.78   9.80  10.88  10.88  10.83  10.92  10.93 
     1   9.75 258.75   9.78   9.79  10.96  10.88  10.86  10.91  10.89 
     2   9.75   9.80 259.23   9.81  10.88  10.94  10.90  10.90  10.89 
     3   9.77   9.79   9.80 258.09  10.88  10.90  10.86  10.91  10.98 
     4  10.83  10.87  10.90  10.90 260.24   9.72   9.77   9.75   9.74 
     5  10.92  10.90  10.97  10.90   9.77 248.33   9.76   9.75   9.76 
     6  10.87  10.90  10.92  10.92   9.73   9.76 260.74   9.77   9.76 
     7  10.91  10.89  10.92  10.97   9.75   9.74   9.74 258.64   9.75 
     8  10.86  10.99  10.93  10.88   9.75   9.78   9.74   9.73 251.02

I also output the gpu topology as the following.

	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	GPU8	mlx4_0	CPU Affinity
GPU0	 X 	PIX	PIX	PIX	NODE	NODE	NODE	NODE	NODE	NODE	0-21
GPU1	PIX	 X 	PIX	PIX	NODE	NODE	NODE	NODE	NODE	NODE	0-21
GPU2	PIX	PIX	 X 	PIX	NODE	NODE	NODE	NODE	NODE	NODE	0-21
GPU3	PIX	PIX	PIX	 X 	NODE	NODE	NODE	NODE	NODE	NODE	0-21
GPU4	NODE	NODE	NODE	NODE	 X 	PIX	PIX	PIX	PIX	NODE	0-21
GPU5	NODE	NODE	NODE	NODE	PIX	 X 	PIX	PIX	PIX	NODE	0-21
GPU6	NODE	NODE	NODE	NODE	PIX	PIX	 X 	PIX	PIX	NODE	0-21
GPU7	NODE	NODE	NODE	NODE	PIX	PIX	PIX	 X 	PIX	NODE	0-21
GPU8	NODE	NODE	NODE	NODE	PIX	PIX	PIX	PIX	 X 	NODE	0-21
mlx4_0	NODE	NODE	NODE	NODE	NODE	NODE	NODE	NODE	NODE	 X

Apparently, the GPU0-GPU3 are much closer than the rest of GPU cards. However, according to the bandwidth test matrix, it is not what I expect. For example, as for GPU card 0, I think the bandwidth between GPU 0 and GPU 1 should be larger than GPU 0 and GPU 4. I am confused about that.

Topic		Replies	Views
the bandwidth is low between my gpus. tested with p2pBandwidthLatencyTest CUDA Programming and Performance	0	659	March 28, 2018
P2P peer communication is slower than the bandwidth between GPU and CPU CUDA Programming and Performance	0	3575	June 5, 2011
How to enable P2P access? CUDA Setup and Installation cuda	3	4628	February 6, 2023
P2P Bandwidth measurements GPU - Hardware cuda	1	216	August 6, 2024
P2p Bandwidth 150% higher than maximum achievable CUDA Programming and Performance cuda , ubuntu	10	2783	April 11, 2023
Confused about CUDA p2pbandwidthlatency sample GPU-Accelerated Libraries cuda	1	967	April 19, 2021
Low P2P GPU bandwidth performance between GeForce GPUs CUDA Programming and Performance	20	1013	October 9, 2024
Low Bandwidth and high latency Peer to Peer between V100 GPUs CUDA Programming and Performance	1	2191	August 8, 2018
Peer-to-Peer Memory Access can suppport a system-wide max of 8 peer connections CUDA Programming and Performance	4	1487	August 30, 2017
NVLink and Quadro RTX 5000 Linux ubuntu	3	1641	February 22, 2022

GPU Peer to Peer communication bandwidth Test result is confused (the furthest card is the best}

Related topics