P2P is not working

P2P is apparently not working. The machine is a Dell Precision Workstation (T3500) with an Intel Xeon X5670 and an Intel X58 IOH. The distro is Debian 10. There are two GeForce GTX 1050 graphics’ cards. As per https://www.intel.com/content/dam/doc/datasheet/x58-express-chipset-datasheet.pdf, the local peer-to-peer appears to not cross the QPI (at least under given circumstances)–see also Figure 7-3 on pg 114. The Dell bios option appears to be “High Performance IO”. This was switched on.

Can you kindly advise?

Is there an approved system’s (board) listing for P2P communications?

cuda and driver version

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 455.32.00    Driver Version: 455.32.00    CUDA Version: 11.1      |
    |-------------------------------+----------------------+----------------------+


compatibility level: 
    6.1


$ nvidia-smi topo -m (partial)
    
            GPU0    GPU1    CPU Affinity    NUMA Affinity
    GPU0     X      PHB     0-11            N/A
    GPU1    PHB      X      0-11            N/A


$ ./simpleP2P

    [./simpleP2P] - Starting...
    Checking for multiple GPUs...
    CUDA-capable device count: 2

    Checking GPU(s) for support of peer to peer memory access...
    > Peer access from GeForce GTX 1050 (GPU0) -> GeForce GTX 1050 (GPU1) : No
    > Peer access from GeForce GTX 1050 (GPU1) -> GeForce GTX 1050 (GPU0) : No
    Two or more GPUs with Peer-to-Peer access capability are required for ./simpleP2P.
    Peer to Peer access is not available amongst GPUs in the system, waiving test.


$ lspci -tv (partial)

    -+-[0000:3f]-+-00.0  Intel Corporation Xeon 5600 Series QuickPath Architecture Generic Non-core Registers
     |           +-00.1  Intel Corporation Xeon 5600 Series QuickPath Architecture System Address Decoder
     |           +-02.0  Intel Corporation Xeon 5600 Series QPI Link 0
     |           +-02.1  Intel Corporation Xeon 5600 Series QPI Physical 0
     |           +-02.2  Intel Corporation Xeon 5600 Series Mirror Port Link 0
     |           +-02.3  Intel Corporation Xeon 5600 Series Mirror Port Link 1
     |           +-02.4  Intel Corporation Xeon 5600 Series QPI Link 1
     |           +-02.5  Intel Corporation Xeon 5600 Series QPI Physical 1
     |           +-03.0  Intel Corporation Xeon 5600 Series Integrated Memory Controller Registers
     |           +-03.1  Intel Corporation Xeon 5600 Series Integrated Memory Controller Target Address Decoder
     |           +-03.2  Intel Corporation Xeon 5600 Series Integrated Memory Controller RAS Registers
     |           +-03.4  Intel Corporation Xeon 5600 Series Integrated Memory Controller Test Registers
     |           +-04.0  Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Control
     |           +-04.1  Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Address
     |           +-04.2  Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Rank
     |           +-04.3  Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Thermal Control
     |           +-05.0  Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Control
     |           +-05.1  Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Address
     |           +-05.2  Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Rank
     |           +-05.3  Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Thermal Control
     |           +-06.0  Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Control
     |           +-06.1  Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Address
     |           +-06.2  Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Rank
     |           \-06.3  Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Thermal Control
     \-[0000:00]-+-00.0  Intel Corporation 5520/5500/X58 I/O Hub to ESI Port
                 +-01.0-[01]--+-00.0  Intel Corporation 82571EB Gigabit Ethernet Controller
                 |            \-00.1  Intel Corporation 82571EB Gigabit Ethernet Controller
                 +-03.0-[02]--+-00.0  NVIDIA Corporation GP107 [GeForce GTX 1050]
                 |            \-00.1  NVIDIA Corporation GP107GL High Definition Audio Controller
                 +-07.0-[03]--+-00.0  NVIDIA Corporation GP107 [GeForce GTX 1050]
                 |            \-00.1  NVIDIA Corporation GP107GL High Definition Audio Controller
                 .
                 .
                 .


$ sudo lspci -s 0000:00:03.0 -vvvv | grep -i acs 
    ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

$ sudo lspci -s 0000:00:07.0 -vvvv | grep -i acs 
    ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

There isn’t one. The general rule that I advise is that the final determinant of any deployment decision around P2P rests exclusively with the tool you are using. More specifically, the final determinant of P2P capability is the result of the cudaDeviceCanAccessPeer API call (or equivalent from driver API).

AFAIK the T3500 only existed in single CPU configs, so QPI should not be a factor. T5500 and T7500 were the next siblings up the ladder, and both had dual CPU configs.

I believe I read (somewhere) that a previous cuda driver version might work whereas a later version might not?

In my experience, I have not witnessed that, but it would certainly be interesting to observe if that is true. By going to an older driver (if old enough) you’re going to also restrict yourself to older CUDA versions. If you go old enough, you will also eventually discover that the driver does not support your GPU.