I use the Nvidia official sample code to test the peer to peer bandwidth one my server and the result is as the following. There is no NVLINK for my server.
Unidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1 2 3 4 5 6 7 8
0 258.57 9.79 9.78 9.80 10.88 10.88 10.83 10.92 10.93
1 9.75 258.75 9.78 9.79 10.96 10.88 10.86 10.91 10.89
2 9.75 9.80 259.23 9.81 10.88 10.94 10.90 10.90 10.89
3 9.77 9.79 9.80 258.09 10.88 10.90 10.86 10.91 10.98
4 10.83 10.87 10.90 10.90 260.24 9.72 9.77 9.75 9.74
5 10.92 10.90 10.97 10.90 9.77 248.33 9.76 9.75 9.76
6 10.87 10.90 10.92 10.92 9.73 9.76 260.74 9.77 9.76
7 10.91 10.89 10.92 10.97 9.75 9.74 9.74 258.64 9.75
8 10.86 10.99 10.93 10.88 9.75 9.78 9.74 9.73 251.02
I also output the gpu topology as the following.
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 GPU8 mlx4_0 CPU Affinity
GPU0 X PIX PIX PIX NODE NODE NODE NODE NODE NODE 0-21
GPU1 PIX X PIX PIX NODE NODE NODE NODE NODE NODE 0-21
GPU2 PIX PIX X PIX NODE NODE NODE NODE NODE NODE 0-21
GPU3 PIX PIX PIX X NODE NODE NODE NODE NODE NODE 0-21
GPU4 NODE NODE NODE NODE X PIX PIX PIX PIX NODE 0-21
GPU5 NODE NODE NODE NODE PIX X PIX PIX PIX NODE 0-21
GPU6 NODE NODE NODE NODE PIX PIX X PIX PIX NODE 0-21
GPU7 NODE NODE NODE NODE PIX PIX PIX X PIX NODE 0-21
GPU8 NODE NODE NODE NODE PIX PIX PIX PIX X NODE 0-21
mlx4_0 NODE NODE NODE NODE NODE NODE NODE NODE NODE X
Apparently, the GPU0-GPU3 are much closer than the rest of GPU cards. However, according to the bandwidth test matrix, it is not what I expect. For example, as for GPU card 0, I think the bandwidth between GPU 0 and GPU 1 should be larger than GPU 0 and GPU 4. I am confused about that.