GPU Peer to Peer communication bandwidth Test result is confused (the furthest card is the best}

I use the Nvidia official sample code to test the peer to peer bandwidth one my server and the result is as the following. There is no NVLINK for my server.

Unidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3      4      5      6      7      8 
     0 258.57   9.79   9.78   9.80  10.88  10.88  10.83  10.92  10.93 
     1   9.75 258.75   9.78   9.79  10.96  10.88  10.86  10.91  10.89 
     2   9.75   9.80 259.23   9.81  10.88  10.94  10.90  10.90  10.89 
     3   9.77   9.79   9.80 258.09  10.88  10.90  10.86  10.91  10.98 
     4  10.83  10.87  10.90  10.90 260.24   9.72   9.77   9.75   9.74 
     5  10.92  10.90  10.97  10.90   9.77 248.33   9.76   9.75   9.76 
     6  10.87  10.90  10.92  10.92   9.73   9.76 260.74   9.77   9.76 
     7  10.91  10.89  10.92  10.97   9.75   9.74   9.74 258.64   9.75 
     8  10.86  10.99  10.93  10.88   9.75   9.78   9.74   9.73 251.02 

I also output the gpu topology as the following.

	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	GPU8	mlx4_0	CPU Affinity
GPU0	 X 	PIX	PIX	PIX	NODE	NODE	NODE	NODE	NODE	NODE	0-21
GPU1	PIX	 X 	PIX	PIX	NODE	NODE	NODE	NODE	NODE	NODE	0-21
GPU2	PIX	PIX	 X 	PIX	NODE	NODE	NODE	NODE	NODE	NODE	0-21
GPU3	PIX	PIX	PIX	 X 	NODE	NODE	NODE	NODE	NODE	NODE	0-21
GPU4	NODE	NODE	NODE	NODE	 X 	PIX	PIX	PIX	PIX	NODE	0-21
GPU5	NODE	NODE	NODE	NODE	PIX	 X 	PIX	PIX	PIX	NODE	0-21
GPU6	NODE	NODE	NODE	NODE	PIX	PIX	 X 	PIX	PIX	NODE	0-21
GPU7	NODE	NODE	NODE	NODE	PIX	PIX	PIX	 X 	PIX	NODE	0-21
GPU8	NODE	NODE	NODE	NODE	PIX	PIX	PIX	PIX	 X 	NODE	0-21
mlx4_0	NODE	NODE	NODE	NODE	NODE	NODE	NODE	NODE	NODE	 X 

Apparently, the GPU0-GPU3 are much closer than the rest of GPU cards. However, according to the bandwidth test matrix, it is not what I expect. For example, as for GPU card 0, I think the bandwidth between GPU 0 and GPU 1 should be larger than GPU 0 and GPU 4. I am confused about that.