Multi-GPU Peer to Peer access failing on Tesla K80

Finally got around to updating the BIOS and problem seems to still exist:

# dmidecode 2.12
SMBIOS 2.8 present.
135 structures occupying 5973 bytes.
Table at 0x000ED8A0.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
        Vendor: American Megatrends Inc.
        Version: 1.0c
        Release Date: 05/20/2015
        Address: 0xF0000
        Runtime Size: 64 kB
        ROM Size: 8192 kB
        Characteristics:
                PCI is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                Boot from CD is supported
                Selectable boot is supported
                BIOS ROM is socketed
                EDD is supported
                5.25"/1.2 MB floppy services are supported (int 13h)
                3.5"/720 kB floppy services are supported (int 13h)
                3.5"/2.88 MB floppy services are supported (int 13h)
                Print screen service is supported (int 5h)
                8042 keyboard services are supported (int 9h)
                Serial services are supported (int 14h)
                Printer services are supported (int 17h)
                ACPI is supported
                USB legacy is supported
                BIOS boot specification is supported
                Targeted content distribution is supported
                UEFI is supported
        BIOS Revision: 5.6

simpleP2P run

[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
> GPU0 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU1 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access...
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes
Enabling peer access between GPU0 and GPU1...
Checking GPU0 and GPU1 for UVA capabilities...
> Tesla K80 (GPU0) supports UVA: Yes
> Tesla K80 (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling...
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 1.12GB/s
Preparing host buffer and memcpy to GPU0...
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
Copy data back to host from GPU0 and verify results...
Verification error @ element 0: val = nan, ref = 0.000000
Verification error @ element 1: val = nan, ref = 4.000000
Verification error @ element 2: val = nan, ref = 8.000000
Verification error @ element 3: val = nan, ref = 12.000000
Verification error @ element 4: val = nan, ref = 16.000000
Verification error @ element 5: val = nan, ref = 20.000000
Verification error @ element 6: val = nan, ref = 24.000000
Verification error @ element 7: val = nan, ref = 28.000000
Verification error @ element 8: val = nan, ref = 32.000000
Verification error @ element 9: val = nan, ref = 36.000000
Verification error @ element 10: val = nan, ref = 40.000000
Verification error @ element 11: val = nan, ref = 44.000000
Disabling peer access...
Shutting down...
Test failed!

Well I don’t think I have any other good ideas of what to check.

Could you run

nvidia-smi -a

and paste the results into this thread. Could you also run the following as root:

dmesg |grep NVRM

and paste the results into this thread.

Also if it’s not too much trouble, could you try updating the driver to this one:

[url]http://www.nvidia.com/Download/driverResults.aspx/90279/en-us[/url]

I realize Tesla K80 is not listed in the supported products list, but it will work with your Tesla K80.

I’ve run tests just like the one you have, on Tesla K80 in a OEM certified box (Dell C4130) and it works correctly. In my case, the OS is RHEL 6.2, I’m using CUDA 7.5, and my driver version is 352.41, which I can’t change.

So the remaining possibilities are the difference between your OS (Ubuntu) and mine (RHEL), or a hardware difference (platform) or a defect of some sort (in the platform, or the K80). I probably won’t be able to debug it remotely to further narrow it down.

Sure thing

nvidia-smi -a

==============NVSMI LOG==============

Timestamp                           : Fri Oct  9 09:41:42 2015
Driver Version                      : 352.39

Attached GPUs                       : 2
GPU 0000:84:00.0
    Product Name                    : Tesla K80
    Product Brand                   : Tesla
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 1920
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0321215003232
    GPU UUID                        : GPU-8fcde20c-90e0-e971-b1d5-958f1e653d39
    Minor Number                    : 0
    VBIOS Version                   : 80.21.1B.00.01
    MultiGPU Board                  : Yes
    Board ID                        : 0x8200
    Inforom Version
        Image Version               : 2080.0200.00.04
        OEM Object                  : 1.1
        ECC Object                  : 3.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    PCI
        Bus                         : 0x84
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x102D10DE
        Bus Id                      : 0000:84:00.0
        Sub System Id               : 0x106C10DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : PLX
            Firmware                : 0xF0472900
        Replays since reset         : 0
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
        Unknown                     : Not Active
    FB Memory Usage
        Total                       : 11519 MiB
        Used                        : 22 MiB
        Free                        : 11497 MiB
    BAR1 Memory Usage
        Total                       : 16384 MiB
        Used                        : 2 MiB
        Free                        : 16382 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
            Double Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
        Aggregate
            Single Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
            Double Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending                     : No
    Temperature
        GPU Current Temp            : 36 C
        GPU Shutdown Temp           : 93 C
        GPU Slowdown Temp           : 88 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 58.16 W
        Power Limit                 : 149.00 W
        Default Power Limit         : 149.00 W
        Enforced Power Limit        : 149.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 175.00 W
    Clocks
        Graphics                    : 562 MHz
        SM                          : 562 MHz
        Memory                      : 2505 MHz
    Applications Clocks
        Graphics                    : 562 MHz
        Memory                      : 2505 MHz
    Default Applications Clocks
        Graphics                    : 562 MHz
        Memory                      : 2505 MHz
    Max Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 2505 MHz
    Clock Policy
        Auto Boost                  : On
        Auto Boost Default          : On
    Processes                       : None

GPU 0000:85:00.0
    Product Name                    : Tesla K80
    Product Brand                   : Tesla
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 1920
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0321215003232
    GPU UUID                        : GPU-5e15df18-c47a-e283-25e7-784aa2930f1d
    Minor Number                    : 1
    VBIOS Version                   : 80.21.1B.00.02
    MultiGPU Board                  : Yes
    Board ID                        : 0x8200
    Inforom Version
        Image Version               : 2080.0200.00.04
        OEM Object                  : 1.1
        ECC Object                  : 3.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    PCI
        Bus                         : 0x85
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x102D10DE
        Bus Id                      : 0000:85:00.0
        Sub System Id               : 0x106C10DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : PLX
            Firmware                : 0xF0472900
        Replays since reset         : 0
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
        Unknown                     : Not Active
    FB Memory Usage
        Total                       : 11519 MiB
        Used                        : 22 MiB
        Free                        : 11497 MiB
    BAR1 Memory Usage
        Total                       : 16384 MiB
        Used                        : 2 MiB
        Free                        : 16382 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 84 %
        Memory                      : 4 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
            Double Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
        Aggregate
            Single Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
            Double Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending                     : No
    Temperature
        GPU Current Temp            : 30 C
        GPU Shutdown Temp           : 93 C
        GPU Slowdown Temp           : 88 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 74.39 W
        Power Limit                 : 149.00 W
        Default Power Limit         : 149.00 W
        Enforced Power Limit        : 149.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 175.00 W
    Clocks
        Graphics                    : 692 MHz
        SM                          : 692 MHz
        Memory                      : 2505 MHz
    Applications Clocks
        Graphics                    : 562 MHz
        Memory                      : 2505 MHz
    Default Applications Clocks
        Graphics                    : 562 MHz
        Memory                      : 2505 MHz
    Max Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 2505 MHz
    Clock Policy
        Auto Boost                  : On
        Auto Boost Default          : On
    Processes                       : None

dmesg |grep NVRM

[   10.419771] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  352.39  Fri Aug 14 18:09:10 PDT 2015

Thanks in any case. I will work on trying to update driver and see what happens

I don’t see any other issues. All of the firmware versions in your K80 match mine. There are no errors reported by the driver in the system logs.

I’m a little bit suspicious that there may be a linux kernel setting that is affecting this, but I can’t put my finger on anything specific right now.

OK one more thing to check.

Can you go into the BIOS setup on that system, and see if there is an option to enable/disable PCIE ACS.

If so, the desire is to turn off ACS and see if it affects the behavior.

I cant seem to locate that in the BIOS. I’m not sure this system’s PCIE ports have that capability

If you want to continue, we can do some further investigation, but it will be a stepwise process.

If you want to continue, please run the following command as root, and post the results here:

lspci | grep -i plx

Based on those results, I should be able to give you the next command to run.

(P.S. This sort of activity generally is not necessary if you purchase a K80 in a qualified OEM config.)

Sure:

lspci | grep -i plx
82:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
83:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
83:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)

Please run the following commands as root, then post the results here:

lspci -s 83:08.0 -vvvv | grep -i acs

and

lspci -s 83:10.0 -vvvv | grep -i acs
sudo lspci -s 83:08.0 -vvvv | grep -i acs
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
                ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
sudo lspci -s 83:10.0 -vvvv | grep -i acs
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
                ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans

Would it make a difference if I used two titan black, or two Tesla K40c GPU’s in this system for P2P transfers and access? Is this just an issue of using a K80 with this motherboard?

I apologize for dropping the ball here. Not sure why I didn’t see your last 2 updates.

If you are still pursuing this, I would like to first point out that the only permanent solution/fix would be to get an updated BIOS from the system vendor that fixes this issue. Supermicro is certainly aware of the underlying issue here as they have applied the necessary fix via BIOS update to some of their other GPU-enabled products.

Anyway, to proceed with the process, you would need to disable ACS on the motherboard, and re-test the simpleP2P test (without rebooting). The steps to disable ACS would be:

setpci -s 83:08.0 f2a.w=0000
setpci -s 83:10.0 f2a.w=0000

Note that this will probably require root privilege. You can then re-verify the changed settings by running the previous lspci commands:

sudo lspci -s 83:08.0 -vvvv | grep -i acs
sudo lspci -s 83:10.0 -vvvv | grep -i acs

at which point for each you should see reported the ACSCtl line with all negative settings:

ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

At this point you should re-run the simpleP2P test to see if the verification errors are still reported.
This change is not permanent: a reboot will restore the machine to the previous state.

If this fixes the issue, then you can leave it as-is if you wish, or else provide this information to your system vendor. They can advise you on the availability of a SBIOS with such a fix.

If it does not fix the issue, I am out of ideas, and it is probably best to refer back to your system vendor.

Regarding your additional question, I suspect that if you used two Tesla K40c GPUs, for example, you would not see this issue, and P2P transfers would “just work” but I am just guessing on that. You would have to try it to be sure.

A related supermicro faq entry is here:

http://www.supermicro.com/support/faqs/faq.cfm?faq=20732

Hi,

we are facing similar issues on the M60 and the K80 both on Supermicro. Your suggested fix worked on our M60 GPUs but not on the K80. On the K80, it passes the simpleP2P test but gets stuck on the p2pBandwidthLatencyTest (you see a Kernel oops in dmesg and only a hard reboot fixes the problem) . ACS is turned off, BIOS up-to-date. I’m unsure what else to try…

Any more ideas?

Thanks,
-Florian

Have you updated the system BIOS on the affected platform to the latest available from Supermicro?

If so, then you should probably contact Supermicro to ask for assistance with the K80.

Thanks txbob. We are renting bare metal. They told me the BIOS is up-to-date but I have my doubts. Do you know by any chance if or what Supermicro BIOS should provide a fix? I’m trying to get the DC provider to update to BIOS to the bleeding edge version.

Since you haven’t identified what actual supermicro system you are referring to, I can’t say anything about BIOS versions.

Find out the model number of your system. Find out the BIOS version currently installed on your system. Google on the supermicro site for that model number and the word “BIOS”. You will then be able to see what is the latest bios version.

If your system is not at the latest bios version, get it updated. If that does not fix the issue, then contact supermicro with that model number in hand, and describe your issue to them.

I faced a similar issue here (Testing nccl with a difficult topology · Issue #19 · NVIDIA/nccl · GitHub) with a supermicro pc 7048GR-TR. It turns out that by disabling the ACS the execution of the simpleP2P runs smoothly, namely:

[r1bsl@supermicro simpleP2P]$ ./simpleP2P
[./simpleP2P] - Starting…
Checking for multiple GPUs…
CUDA-capable device count: 6

GPU0 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU1 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU2 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU3 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU4 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU5 = " Tesla K80" IS capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access…

Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU1) : Yes
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU2) : No
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU3) : No
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU4) : No
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU5) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU0) : Yes
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU2) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU3) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU4) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU5) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU3) : Yes
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU4) : Yes
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU5) : Yes
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU2) : Yes
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU4) : Yes
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU5) : Yes
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU2) : Yes
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU3) : Yes
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU5) : Yes
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU2) : Yes
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU3) : Yes
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU4) : Yes
Enabling peer access between GPU0 and GPU1…
Checking GPU0 and GPU1 for UVA capabilities…
Tesla K80 (GPU0) supports UVA: Yes
Tesla K80 (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling…
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)…
Creating event handles…
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 7.42GB/s
Preparing host buffer and memcpy to GPU0…
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1…
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0…
Copy data back to host from GPU0 and verify results…
Disabling peer access…
Shutting down…
Test passed

Hi

I am facing a similar kind of problem. Came across this actually while training a deep network using caffe.
The machine has 4 K80 GPUs and I am not able to utilize the power of those due to failed P2P access between them.

A bit of diagnosis from my end:

On running nvidia-smi topo -m

    GPU0	    GPU1	    GPU2	 GPU3    CPU Affinity

GPU0 X PHB PHB PHB 0-15
GPU1 PHB X PHB PHB 0-15
GPU2 PHB PHB X PHB 0-15
GPU3 PHB PHB PHB X 0-15

This looks OK to me as all the GPUs have access to each other through a PCIe host bridge.

But while running simpleP2P test in cuda samples, this is what I get:

Checking for multiple GPUs…
CUDA-capable device count: 4

GPU0 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU1 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU2 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU3 = " Tesla K80" IS capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access…

Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU2) : No
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU3) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU2) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU3) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU3) : No
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU2) : No
Two or more GPUs with SM 2.0 or higher capability are required for ./simpleP2P.
Peer to Peer access is not available amongst GPUs in the system, waiving test.

Could someone please help me debug this?

What sort of system are the K80s installed in? Did you purchase the system from an OEM that has qualified the system for use with K80s ?

Are you using a supported CUDA configuration? (e.g. OS)

If so, you should contact the OEM to arrange for technical support

Hi, I am having a similar problem!

I have two GPUs in the same PIX, but without P2P-enabled.

nvidia-smi

Thu Nov 24 06:14:19 2016
±-----------------------------------------------------+
| NVIDIA-SMI 352.39 Driver Version: 352.39 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 0000:03:00.0 Off | 0 |
| N/A 38C P0 57W / 149W | 22MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K80 On | 0000:04:00.0 Off | 0 |
| N/A 24C P8 29W / 149W | 22MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla K80 On | 0002:03:00.0 Off | 0 |
| N/A 34C P8 28W / 149W | 22MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla K80 On | 0002:04:00.0 Off | 0 |
| N/A 34C P0 70W / 149W | 867MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |

nvidia-smi topo -m

GPU0	GPU1	GPU2	GPU3	CPU Affinity

GPU0 X PIX SOC SOC 0-79
GPU1 PIX X SOC SOC 0-79
GPU2 SOC SOC X PIX 80-159
GPU3 SOC SOC PIX X 80-159

Legend:

X = Self
SOC = Path traverses a socket-level link (e.g. QPI)
PHB = Path traverses a PCIe host bridge
PXB = Path traverses multiple PCIe internal switches
PIX = Path traverses a PCIe internal switch

./testAllP2P

[./testAllP2P] - Starting…
Checking for multiple GPUs…
CUDA-capable device count: 4

Access from Tesla K80 (GPU0) → Tesla K80 (GPU1)
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU1) : No

Access from Tesla K80 (GPU0) → Tesla K80 (GPU2)
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU2) : No

Access from Tesla K80 (GPU0) → Tesla K80 (GPU3)
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU3) : No

Access from Tesla K80 (GPU1) → Tesla K80 (GPU2)
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU2) : No

Access from Tesla K80 (GPU1) → Tesla K80 (GPU3)
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU3) : No

Access from Tesla K80 (GPU2) → Tesla K80 (GPU3)
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU3) : No

uname -a

Linux 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:31:44 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

lscpu

lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 160
On-line CPU(s) list: 0-159
Thread(s) per core: 8
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Model: 8335-GTA
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-79
NUMA node8 CPU(s): 80-159

lspci | grep -i plx

0000:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0000:02:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0000:02:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0002:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0002:02:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0002:02:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0003:01:00.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0003:01:00.1 System peripheral: PLX Technology, Inc. Device 87d0 (rev ca)
0003:01:00.2 System peripheral: PLX Technology, Inc. Device 87d0 (rev ca)
0003:01:00.3 System peripheral: PLX Technology, Inc. Device 87d0 (rev ca)
0003:01:00.4 System peripheral: PLX Technology, Inc. Device 87d0 (rev ca)
0003:02:01.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0003:02:08.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0003:02:09.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0003:02:0a.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0003:02:0b.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0003:02:0c.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)

lspci -s 0000:02:08.0 -vvvv | grep -i acs

	UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UESvrt:	DLP- SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
	ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

lspci -s 0000:02:10.0 -vvvv | grep -i acs

	UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UESvrt:	DLP- SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
	ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

All the other pci devices are with the same configuration

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:31:50_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

Please, could you help me with any clue?