Why d2h is slower than h2d on device 0-3?

,

I have a server with 8 40g A100s connected with NV-Link, and I found the bandwidth of d2h is much slower than that of h2d for device 0-3, while equal for device 4-7.
Anyone knows what caused the slowdown?

driver:
NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2

device 0-3:
numactl --cpunodebind=1 --membind=1 ./bandwidthTest --dtoh --htod --device=0

[CUDA Bandwidth Test] - Starting...                                                                                                                                                                                                          
Running on...                                                                                                                                                                                                                                
                                                                                                                                                                                                                                             
 Device 0: A100-SXM4-40GB                                                                                                                                                                                                                    
 Quick Mode                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                             
 Host to Device Bandwidth, 1 Device(s)                                                                                                                                                                                                       
 PINNED Memory Transfers                                                                                                                                                                                                                     
   Transfer Size (Bytes)        Bandwidth(GB/s)                                                                                                                                                                                              
   32000000                     24.4                                                                                                                                                                                                         
                                                                                                                                                                                                                                             
 Device to Host Bandwidth, 1 Device(s)                                                                                                                                                                                                       
 PINNED Memory Transfers                                                                                                                                                                                                                     
   Transfer Size (Bytes)        Bandwidth(GB/s)                                                                                                                                                                                              
   32000000                     14.9                                                                                                                                                                                                         
                                                                                                                                                                                                                                             
Result = PASS    

while device 4-7:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 4: A100-SXM4-40GB
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     24.9

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     26.7

Result = PASS
                                                                                                                                                                                                                           

topology is :

        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    mlx5_0  mlx5_1  CPU Affinity    NUMA Affinity
GPU0     X      NV12    NV12    NV12    NV12    NV12    NV12    NV12    PXB     SYS     0-63,128-191    0
GPU1    NV12     X      NV12    NV12    NV12    NV12    NV12    NV12    PXB     SYS     0-63,128-191    0
GPU2    NV12    NV12     X      NV12    NV12    NV12    NV12    NV12    NODE    SYS     0-63,128-191    0
GPU3    NV12    NV12    NV12     X      NV12    NV12    NV12    NV12    NODE    SYS     0-63,128-191    0
GPU4    NV12    NV12    NV12    NV12     X      NV12    NV12    NV12    SYS     NODE    64-127,192-255  1
GPU5    NV12    NV12    NV12    NV12    NV12     X      NV12    NV12    SYS     NODE    64-127,192-255  1
GPU6    NV12    NV12    NV12    NV12    NV12    NV12     X      NV12    SYS     PXB     64-127,192-255  1
GPU7    NV12    NV12    NV12    NV12    NV12    NV12    NV12     X      SYS     PXB     64-127,192-255  1
mlx5_0  PXB     PXB     NODE    NODE    SYS     SYS     SYS     SYS      X      SYS
mlx5_1  SYS     SYS     SYS     SYS     NODE    NODE    PXB     PXB     SYS      X 

lspci -vvv -t

-+-[0000:d6]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1480
 |           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1481
 |           +-01.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-03.1-[d7-f0]----00.0-[d8-f0]--+-00.0-[d9-e2]----00.0-[da-e2]--+-00.0-[db]--
 |           |                               |                               +-04.0-[dc]----00.0  Intel(R) NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
 |           |                               |                               +-08.0-[dd]--
 |           |                               |                               +-0c.0-[de]--
 |           |                               |                               \-10.0-[df-e2]----00.0-[e0-e2]--+-00.0-[e1]----00.0  NVIDIA Corporation Device 20b0
 |           |                               |                                                               \-1f.0-[e2]----00.0  LSI Logic / Symbios Logic Device 00b2
 |           |                               +-04.0-[e3-e7]----00.0-[e4-e7]----00.0-[e5-e7]----00.0-[e6-e7]----00.0-[e7]----00.0  NVIDIA Corporation Device 20b0
 |           |                               +-08.0-[e8-eb]----00.0-[e9-eb]--+-00.0-[ea]----00.0  Mellanox Technologies MT28908 Family [ConnectX-6]
 |           |                               |                               \-10.0-[eb]--
 |           |                               +-0c.0-[ec-ef]----00.0-[ed-ef]--+-14.0-[ee]--
 |           |                               |                               \-15.0-[ef]--
 |           |                               \-1c.0-[f0]----00.0  LSI Logic / Symbios Logic Device c010
 |           +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-05.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-07.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-07.1-[f1]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 148a
 |           |            \-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 |           +-08.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           \-08.1-[f2]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1485
 |                        \-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 +-[0000:c1]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1480
 |           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1481
 |           +-01.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-03.1-[c2]--
 |           +-03.2-[c3]--
 |           +-03.3-[c4]--
 |           +-03.4-[c5]--
 |           +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-05.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-07.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-07.1-[c6]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 148a
 |           |            \-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 |           +-08.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-08.1-[c7]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1485
 |           |            \-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 |           \-08.2-[c8]----00.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
 +-[0000:97]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1480
 |           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1481
 |           +-01.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-03.1-[98-b1]----00.0-[99-b1]--+-00.0-[9a-a2]----00.0-[9b-a2]--+-00.0-[9c]--
 |           |                               |                               +-04.0-[9d]--
 |           |                               |                               +-08.0-[9e]--
 |           |                               |                               +-0c.0-[9f]--
 |           |                               |                               \-10.0-[a0-a2]----00.0-[a1-a2]----00.0-[a2]----00.0  NVIDIA Corporation Device 20b0
 |           |                               +-04.0-[a3-a8]----00.0-[a4-a8]----00.0-[a5-a8]----00.0-[a6-a8]--+-00.0-[a7]----00.0  NVIDIA Corporation Device 20b0
 |           |                               |                                                               \-1f.0-[a8]----00.0  LSI Logic / Symbios Logic Device 00b2
 |           |                               +-08.0-[a9-ac]----00.0-[aa-ac]--+-00.0-[ab]--
 |           |                               |                               \-10.0-[ac]--
 |           |                               +-0c.0-[ad-b0]----00.0-[ae-b0]--+-14.0-[af]--
 |           |                               |                               \-15.0-[b0]--
 |           |                               \-1c.0-[b1]----00.0  LSI Logic / Symbios Logic Device c010
 |           +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-05.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-07.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-07.1-[b2]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 148a
 |           |            \-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 |           +-08.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           \-08.1-[b3]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1485
 |                        +-00.1  Advanced Micro Devices, Inc. [AMD] Device 1486
 |                        +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 |                        \-00.3  Advanced Micro Devices, Inc. [AMD] Device 148c
 +-[0000:87]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1480
 |           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1481
 |           +-01.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-05.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-07.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-07.1-[88]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 148a
 |           |            \-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 |           +-08.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           \-08.1-[89]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1485
 |                        +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 |                        \-00.3  Advanced Micro Devices, Inc. [AMD] Device 148c
 +-[0000:5a]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1480
 |           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1481
 |           +-01.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-03.1-[5b-74]----00.0-[5c-74]--+-00.0-[5d-65]----00.0-[5e-65]--+-00.0-[5f]--
 |           |                               |                               +-04.0-[60]--
 |           |                               |                               +-08.0-[61]--
 |           |                               |                               +-0c.0-[62]--
 |           |                               |                               \-10.0-[63-65]----00.0-[64-65]----00.0-[65]----00.0  NVIDIA Corporation Device 20b0
 |           |                               +-04.0-[66-6b]----00.0-[67-6b]----00.0-[68-6b]----00.0-[69-6b]--+-00.0-[6a]----00.0  NVIDIA Corporation Device 20b0
 |           |                               |                                                               \-1f.0-[6b]----00.0  LSI Logic / Symbios Logic Device 00b2
 |           |                               +-08.0-[6c-6f]----00.0-[6d-6f]--+-00.0-[6e]--
 |           |                               |                               \-10.0-[6f]--
 |           |                               +-0c.0-[70-73]----00.0-[71-73]--+-14.0-[72]--
 |           |                               |                               \-15.0-[73]--
 |           |                               \-1c.0-[74]----00.0  LSI Logic / Symbios Logic Device c010
 |           +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-05.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-05.2-[75-76]----00.0-[76]----00.0  ASPEED Technology, Inc. ASPEED Graphics Family
 |           +-07.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-07.1-[77]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 148a
 |           |            \-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 |           +-08.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           \-08.1-[78]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1485
 |                        \-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 +-[0000:46]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1480
 |           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1481
 |           +-01.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-01.1-[47]--
 |           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-05.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-07.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-07.1-[48]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 148a
 |           |            \-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 |           +-08.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-08.1-[49]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1485
 |           |            \-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 |           +-08.2-[4a]----00.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
 |           \-08.3-[4b]----00.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
 +-[0000:12]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1480
 |           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1481
 |           +-01.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-03.1-[13-36]----00.0-[14-36]--+-00.0-[15-27]----00.0-[16-27]--+-00.0-[17]--
 |           |                               |                               +-04.0-[18]--
 |           |                               |                               +-08.0-[19-22]----00.0-[1a-22]----00.0-[1b-22]----00.0-[1c-22]--+-01.0-[1d]----00.0  NVIDIA Corporation Device 1af1
 |           |                               |                               |                                                               +-02.0-[1e]----00.0  NVIDIA Corporation Device 1af1
 |           |                               |                               |                                                               +-03.0-[1f]----00.0  NVIDIA Corporation Device 1af1
 |           |                               |                               |                                                               +-04.0-[20]----00.0  NVIDIA Corporation Device 1af1
 |           |                               |                               |                                                               +-0b.0-[21]----00.0  NVIDIA Corporation Device 1af1
 |           |                               |                               |                                                               \-0c.0-[22]----00.0  NVIDIA Corporation Device 1af1
 |           |                               |                               +-0c.0-[23]--
 |           |                               |                               \-10.0-[24-27]----00.0-[25-27]--+-00.0-[26]----00.0  NVIDIA Corporation Device 20b0
 |           |                               |                                                               \-1f.0-[27]----00.0  LSI Logic / Symbios Logic Device 00b2
 |           |                               +-04.0-[28-2c]----00.0-[29-2c]----00.0-[2a-2c]----00.0-[2b-2c]----00.0-[2c]----00.0  NVIDIA Corporation Device 20b0
 |           |                               +-08.0-[2d-31]----00.0-[2e-31]--+-00.0-[2f]----00.0  Mellanox Technologies MT28908 Family [ConnectX-6]
 |           |                               |                               \-10.0-[30-31]--+-00.0  Intel(R) Ethernet Controller 10G X550T
 |           |                               |                                               \-00.1  Intel(R) Ethernet Controller 10G X550T
 |           |                               +-0c.0-[32-35]----00.0-[33-35]--+-14.0-[34]--
 |           |                               |                               \-15.0-[35]--
 |           |                               \-1c.0-[36]----00.0  LSI Logic / Symbios Logic Device c010
 |           +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-05.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-07.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           +-07.1-[37]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 148a
 |           |            \-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 |           +-08.0  Advanced Micro Devices, Inc. [AMD] Device 1482
 |           \-08.1-[38]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1485
 |                        +-00.1  Advanced Micro Devices, Inc. [AMD] Device 1486
 |                        +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
 |                        \-00.3  Advanced Micro Devices, Inc. [AMD] Device 148c
 \-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1480
             +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1481
             +-01.0  Advanced Micro Devices, Inc. [AMD] Device 1482
             +-01.1-[01]----00.0  LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader]
             +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1482
             +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1482
             +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1482
             +-05.0  Advanced Micro Devices, Inc. [AMD] Device 1482
             +-07.0  Advanced Micro Devices, Inc. [AMD] Device 1482
             +-07.1-[02]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 148a
             |            \-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
             +-08.0  Advanced Micro Devices, Inc. [AMD] Device 1482
             +-08.1-[03]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1485
             |            +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1498
             |            \-00.3  Advanced Micro Devices, Inc. [AMD] Device 148c
             +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
             +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
             +-18.0  Advanced Micro Devices, Inc. [AMD] Device 1490
             +-18.1  Advanced Micro Devices, Inc. [AMD] Device 1491
             +-18.2  Advanced Micro Devices, Inc. [AMD] Device 1492
             +-18.3  Advanced Micro Devices, Inc. [AMD] Device 1493
             +-18.4  Advanced Micro Devices, Inc. [AMD] Device 1494
             +-18.5  Advanced Micro Devices, Inc. [AMD] Device 1495
             +-18.6  Advanced Micro Devices, Inc. [AMD] Device 1496
             +-18.7  Advanced Micro Devices, Inc. [AMD] Device 1497
             +-19.0  Advanced Micro Devices, Inc. [AMD] Device 1490
             +-19.1  Advanced Micro Devices, Inc. [AMD] Device 1491
             +-19.2  Advanced Micro Devices, Inc. [AMD] Device 1492
             +-19.3  Advanced Micro Devices, Inc. [AMD] Device 1493
             +-19.4  Advanced Micro Devices, Inc. [AMD] Device 1494
             +-19.5  Advanced Micro Devices, Inc. [AMD] Device 1495
             +-19.6  Advanced Micro Devices, Inc. [AMD] Device 1496
             \-19.7  Advanced Micro Devices, Inc. [AMD] Device 1497

The CPU affinity is not the same between those 2 groups of devices (which you can see in your topology output). So if you are using the same process placement (the same numactl settings) then that may be the issue. For H->D or D->H on your system, NVLink isn’t involved.

Perhaps try:

numactl --cpunodebind=0 --membind=0  ...

for devices 0-3

Yeah, my bad, didn’t notice that!
Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.