Hi, I’m trying to get some bench marking work done on a Dell R7425 Server with three V100 GPUs.
System Setup
Cuda 9.1
Nvidia Driver: 396.26
GPU: 3xTesla V100
However, when I run the nvidia peer to peer test, I get very low results for the peer to peer enabled tests, far worse than the tests where peer to peer is disabled. Results below:
root@R7425-V100-1:~/NVIDIA_CUDA-9.1_Samples/1_Utilities/p2pBandwidthLatencyTest# ./p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, Tesla V100-PCIE-16GB, pciBusID: 21, pciDeviceID: 0, pciDomainID:0
Device: 1, Tesla V100-PCIE-16GB, pciBusID: 41, pciDeviceID: 0, pciDomainID:0
Device: 2, Tesla V100-PCIE-16GB, pciBusID: 81, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=0 CAN Access Peer Device=2
Device=1 CAN Access Peer Device=0
Device=1 CAN Access Peer Device=2
Device=2 CAN Access Peer Device=0
Device=2 CAN Access Peer Device=1
***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) in those cases.
P2P Connectivity Matrix
D\D 0 1 2
0 1 1 1
1 1 1 1
2 1 1 1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1 2
0 735.68 4.17 4.85
1 4.53 732.84 5.78
2 4.58 5.31 735.64
Unidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1 2
0 737.03 0.76 0.76
1 0.72 749.76 0.76
2 0.75 0.76 748.32
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1 2
0 752.65 7.40 7.88
1 7.35 755.56 8.94
2 7.01 9.02 759.23
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1 2
0 742.63 1.46 1.44
1 1.48 760.71 1.47
2 1.46 1.47 762.20
P2P=Disabled Latency Matrix (us)
D\D 0 1 2
0 6.84 23.81 20.20
1 24.50 7.27 20.75
2 20.62 20.01 6.10
P2P=Enabled Latency Matrix (us)
D\D 0 1 2
0 6.53 49360.66 49356.82
1 49357.70 6.81 49357.77
2 49353.03 49353.09 5.75
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
I’m using nvidia driver 396.26
root@R7425-V100-1:~/NVIDIA_CUDA-9.1_Samples/1_Utilities/p2pBandwidthLatencyTest# nvidia-smi
Wed Aug 8 09:48:09 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26 Driver Version: 396.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:21:00.0 Off | 0 |
| N/A 36C P0 35W / 250W | 0MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:41:00.0 Off | 0 |
| N/A 36C P0 35W / 250W | 0MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-PCIE... Off | 00000000:81:00.0 Off | 0 |
| N/A 38C P0 36W / 250W | 0MiB / 16160MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Please let me know if there’s anything I can do.
Thanks,
Josh