Hi,
We have tested peer to peer communication between 2 V100 GPU cards. The tested data as follows:
data P2P communication unused P2P communication used
GPU Latency(us) CPU Latency(us) GPU Latency(us) CPU Latency(us)
4Byte 15.45 11.51 1.66 3.13
64KByte 28.75 11.42 7.86 3.11
1MByte 178.43 11.50 107.44 3.08
2.97MByte 362.87 29.53 316.80 3.37
4MByte 457.72 34.45 425.61 3.20
8Mbyte 822.97 396.65 951.18 3.41
As we can see, when the amount of communication data increases, the latency of using P2P communication gradually approaches the latency of not using P2P communication, and eventually exceeds it. Is it reasonable?why?