Low Bandwidth and high latency Peer to Peer between V100 GPUs

Hi, I’m trying to get some bench marking work done on a Dell R7425 Server with three V100 GPUs.

System Setup
Cuda 9.1
Nvidia Driver: 396.26
GPU: 3xTesla V100

However, when I run the nvidia peer to peer test, I get very low results for the peer to peer enabled tests, far worse than the tests where peer to peer is disabled. Results below:

root@R7425-V100-1:~/NVIDIA_CUDA-9.1_Samples/1_Utilities/p2pBandwidthLatencyTest# ./p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, Tesla V100-PCIE-16GB, pciBusID: 21, pciDeviceID: 0, pciDomainID:0
Device: 1, Tesla V100-PCIE-16GB, pciBusID: 41, pciDeviceID: 0, pciDomainID:0
Device: 2, Tesla V100-PCIE-16GB, pciBusID: 81, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=0 CAN Access Peer Device=2
Device=1 CAN Access Peer Device=0
Device=1 CAN Access Peer Device=2
Device=2 CAN Access Peer Device=0
Device=2 CAN Access Peer Device=1

***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) in those cases.

P2P Connectivity Matrix
     D\D     0     1     2
     0       1     1     1
     1       1     1     1
     2       1     1     1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1      2
     0 735.68   4.17   4.85
     1   4.53 732.84   5.78
     2   4.58   5.31 735.64
Unidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1      2
     0 737.03   0.76   0.76
     1   0.72 749.76   0.76
     2   0.75   0.76 748.32
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1      2
     0 752.65   7.40   7.88
     1   7.35 755.56   8.94
     2   7.01   9.02 759.23
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1      2
     0 742.63   1.46   1.44
     1   1.48 760.71   1.47
     2   1.46   1.47 762.20
P2P=Disabled Latency Matrix (us)
   D\D     0      1      2
     0   6.84  23.81  20.20
     1  24.50   7.27  20.75
     2  20.62  20.01   6.10
P2P=Enabled Latency Matrix (us)
   D\D     0      1      2
     0   6.53 49360.66 49356.82
     1 49357.70   6.81 49357.77
     2 49353.03 49353.09   5.75

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

I’m using nvidia driver 396.26

root@R7425-V100-1:~/NVIDIA_CUDA-9.1_Samples/1_Utilities/p2pBandwidthLatencyTest# nvidia-smi
Wed Aug  8 09:48:09 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:21:00.0 Off |                    0 |
| N/A   36C    P0    35W / 250W |      0MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  Off  | 00000000:41:00.0 Off |                    0 |
| N/A   36C    P0    35W / 250W |      0MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-PCIE...  Off  | 00000000:81:00.0 Off |                    0 |
| N/A   38C    P0    36W / 250W |      0MiB / 16160MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Please let me know if there’s anything I can do.

Thanks,
Josh

You should probably take this up directly with Dell.

Currently, the Dell online configurator shows no GPU options in the R7425.

I’m not sure what you have is an OEM supported config. If it is, you should address your concerns about its behavior with Dell.