Questions about p2pBandwidthLatencyTest

Hi,

We have tested peer to peer communication between 2 V100 GPU cards. The tested data as follows:

data P2P communication unused P2P communication used
GPU Latency(us) CPU Latency(us) GPU Latency(us) CPU Latency(us)
4Byte 15.45 11.51 1.66 3.13
64KByte 28.75 11.42 7.86 3.11
1MByte 178.43 11.50 107.44 3.08
2.97MByte 362.87 29.53 316.80 3.37
4MByte 457.72 34.45 425.61 3.20
8Mbyte 822.97 396.65 951.18 3.41

As we can see, when the amount of communication data increases, the latency of using P2P communication gradually approaches the latency of not using P2P communication, and eventually exceeds it. Is it reasonable?why?

屏幕快照 2019-07-16 下午4.09.28

data                    P2P communication unused	               P2P communication used
                      GPU Latency(us)	CPU Latency(us)	   GPU Latency(us)	CPU Latency(us)
4Byte	               15.45	                11.51	       1.66	               3.13
64KByte	               28.75	                11.42	       7.86	               3.11
1MByte	               178.43	                11.50	       107.44	               3.08
2.97MByte	       362.87	                29.53	       316.80	               3.37
4MByte	               457.72	                34.45	       425.61	               3.20
8Mbyte	               822.97	                396.65	       951.18	               3.41