I seem to be a little confused about CUDA “p2pbandwidthlatency”.
I installed 3 NVIDIA A40 GPUs without NVLink and ran this example.
The bandwidth of device 0 to 0 is about 640 GB/s. This data is based on official specifications." GPU Memory Bandwidth"?
GPU Memory 48 GB GDDR6 with error-correcting code (ECC)
GPU Memory Bandwidth 696 GB/s
NVIDIA NVLink 112.5 GB/s (bidirectional)
PCIE Gen4 x16 31.5 GB/s (bidirectional)
NVLink 2-way low profile (2-slot)
Display Ports 3x DisplayPort 1.4*
Max Power Consumption 300 W
Form Factor 4.4" (H) x 10.5" (L) Dual Slot
vGPU Software Support NVIDIA vPC/vApps, NVIDIA RTX Virtual Workstation, NVIDIA Virtual Compute Server
vGPU Profiles Supported See the Virtual GPU Licensing Guide
NVENC NVDEC 1x 2x (includes AV1 decode)
Secure and Measured Boot with Hardware Root of Trust Yes
NEBS Ready Level 3
Power Connector 8-pin CPU
In addition, what is the reference direction of “0 to 1” and “0 to 2” Bidirectional P2P Bandwidth?
Reference “Interconnect” PCIe Gen4 x 16?