Quadro RTX 8000 Multi-GPU Performance Issue

Hi,

I have a program that runs identical CUDA code (on identical sets of data
of the same size) on 2 GPUs for multiple iterations.

The hardware is:

  • Intel® Xeon® Silver 4110 CPU @ 2.10GHz 2.10GHz (2 processors)
  • 256GB RAM
  • Windows server 2019
  • 2x Quadro RTX 8000

In each iteration:

  • Copy data from HOST pinned memory to GPU 0 and GPU 1
  • Clear previously used GPU memory through cudaMemsetAsync, execute user-write kernel (MapXYScale) to create maps for images remapping (the other kernels are calls to nppi functions, i.e. nppiWarpAffine_8u_C1R)
  • Re-execute step 2
  • Copy data from GPU to HOST (cudaEventSync)

Attached there is a screenshot of what happening from NVIDIA NSight System 2020.1.1

Problem:
When running using 2 Quadro RTX 8000, the performance is good on device 0 but in device 1 are ~5.5x worse. Anyway, in device 0, happens the same kernel (1. and 2.) is executed two times with the same data and the execution time is ~2x in second one.

The code has been tested with GPUs in WDDM and TCC mode, but without differences.

Performance problem details
Since the first iterations most of operations (device 1) are slow, including:

  • cudaMemsetAsynch
  • user-write kernel execution
  • nvidia kernels (i.e. nppiRemap_8u_C1R)

Attached a screenshot of nvidia-smi.exe during the execution, hoping it could help.

I tried running the same program on 2 Quadro P6000. The performances are
consistent across both GPUs and throughout all the iterations on this node.

The application has been compiled both with CUDA 10.2 and CUDA 11.1 without differences in performance.

Has anyone had the same problem? Any insights or suggestions on how to investigate this further would be highly appreciated.

Thank you.

Something to look into, although from the cross-check experiment with the two P6000s this does not seem like a likely underlying cause:

PCIe is a point-to-point interconnect. There is a PCIe root complex in each CPU. In multi-socket systems each PCIe slot is connected to a particular PCIe root complex. Also, each CPU has its own memory controller. You would want to make sure each CPU is communicating with the “near” GPU and the “near” system memory, as otherwise data needs to traverse the CPU-to-CPU interconnect, creating a NUMA effect.

You would therefore want to pay particular attention to processor and memory affinity of your program, and control it with a tool like numactl. Sorry, I don’t know what the equivalent tool under Windows would be, as I have never worked on a multi-socket Windows system.

1 Like

HI I was wondering if you did find a solution, or any suggestion. I have exactly the same problem, on the same machine, with the same GPU (RTX 8000), but on linux (Debian). According to my first verifications, no numa effect is implied, the nvidia smi says the gpu runs on a maximum of 61W, even whe under stress, and a single cuda benchmark I tried (in particular on single precision operations, mixbench, GitHub - ekondis/mixbench: A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP) ), gives a speed of 330 GFlops. I mounted the board on another motherboard, and tested it, and the results were around, if I am not wrong, 17K GFlops in single precision, however, as expected…
So, at least, the board is not broken (PNY RTX 8000).

We did a lot of tests and operating system re-installations, including a run on windows 11 pro, but… nothing changed. My impression is that the power cables from the main power distribution board (micro molex x8, on the custom HP cable) are giving the voltage (I tested also without them , and GPU was not detected), but not enough amperes…

Thank you for any help, however I will keep going on with the tests and attempts…

That seems indicative of a power-supply issue. According to the PCIe specification, a PCIe slot can provide up to 75W. However, from my observations NVIDIA GPUs typically try to limit that to around 65W and many GPUs are designed to draw no more than 40W from the slot.

We have had a few reports in this forum of GPUs with defective on-board connectors for PCIe auxilliary power. However, you state that you tried the Quadro RTX 8000 in a different machine and it worked with expected performance there. While issues with power connector could be intermittent dependent on mechanical stress, the more likely explanation is that your power supply unit (PSU) is not able to provide adequate current.

As far as I can determine, the Quadro RTX 8000 sports one 6-pin and one 8-pin connector for PCIe auxilliary power. 6-pin connectors are rated for 75W and 8-pin connectors are rated for 150W. Since the GPU has a nominal power draw of 260W, there should be head room, since (75W + 75W + 150W) > 260W. Looking at the NVIDIA specifications, they state: total graphics power = 260W, total board power = 295W. I am not sure what the difference is between the two power ratings, but either way it is less than the 300W that slot, 6-pin, and 8-pin connectors can provide collectively.

There should be a direct connection from each power connector to the PSU. No daisy chaining, Y-splitters, or converters (in particular: 6-pin to 8-pin) should be used.

My experience with buying pre-configured systems is that unless the purchaser actively intervenes, the installed PSU will be sufficient only for the components configured at the time of initial purchase, making future expansion with additional, or more highly-powered, components difficult. What is the nominal power rating of the installed PSU?

1 Like

Hi njuffa, thank you a lot for your answer and your time.
The machine has a double (redundant, hot swappable) set of PSUs, each one rated for 800W. I also tried a single PSU with 1600W. As you can imagine, the final result was the same. BTW, according to the ILO panel (the remote low level administration interface these machines often have), the maximum power drain registered is more or less near 270 W, far lower the possibility of one single PSU, and however lower then the power which everything would drain if the GPU was at full throttle!!

The PSUs are connected to a standard distribution board, which gives power to the motherboard and has these 4 micro molex connectors (eigth poles: they are just smaller than the molex connector on the gpu side).
I tried to find any info about different versions of this distribution board, but… nothing, I just found one part with one part number.

In order to power the GPU, the micro molex needs to be connected with a custom HP cable (very expensive…).
This cable provides the two auxiliary power connectors for the GPU.
I tried also with two cables, connecting each GPU power connector to just one distribution board power connector… nothing changed.

Just to finish with a, I would say, terrible story… take a look at this post, on the HP forum: the machine is slightly different (ML380), the GPU is the lower version (RTX 6000), but… still a PNY GPU. Sorry I was not able to provide the direct link to the HPE expert post, however, here is the link to the thread:

https://community.hpe.com/t5/proliant-servers-ml-dl-sl/nvidia-quadro-rtx-6000-on-dl380-gen10-not-work/td-p/7174845

and then, just scroll down to the sixth post (the third of the HPE expert guy): ok, he mentions possible issues with the cable, we can say fair, but then, in the “not to mention” part, he says that the GPU board is a third party one (PNY), so, more or less, anything can happen and HP will be not responsible. To be honest, I do not agree, but, lets say, “I understand the point”.
I am starting loosing my hopes of seeing this PNY gpu working inside a super maxi extreme enterprise graded server from hp.
At least, I can see it working connected to an old asus mobo (B250) I bought, new, on Amazon, in 2019, for less than 40 Euro (including VAT)… and probably, I’ll switch back to the server mobos from Gigabyte, supermicro, …

Thank you again for your attention and your patience

When you are running a “stress” workload on the “slow” GPU, what is the output of nvidia-smi -a ?

61 Watts is the maximum, with some peaks to 70…
when the gpu is idle, it runs somewhere between 6 and 21 Watts…

oppss… Sorry Robert, I am working and did not read your post with the due attention, you were asking not just for nvidia-smi. I cannot test it just now, I will do later. thank you


Hi Robert,
ok, here I am. I have to reedit this post (limit of three answers for a new user on this forum)

hereunder the output of nvidia-smi -a, the gpu is NOT under the “stress test”
Sorry for the late answer, but I had to reinstall the drivers.

root@giudittai:/zzDevEnv/drivers/nvidia/video/nv_570# nvidia-smi -a

==============NVSMI LOG==============

Timestamp                                 : Fri Mar  7 19:36:04 2025
Driver Version                            : 570.124.04
CUDA Version                              : 12.8

Attached GPUs                             : 1
GPU 00000000:86:00.0
    Product Name                          : Quadro RTX 8000
    Product Brand                         : Quadro RTX
    Product Architecture                  : Turing
    Display Mode                          : Disabled
    Display Active                        : Disabled
    Persistence Mode                      : Disabled
    Addressing Mode                       : None
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1324021077727
    GPU UUID                              : GPU-3c80ca5c-979c-623a-0428-64e2ef631913
    Minor Number                          : 0
    VBIOS Version                         : 90.02.4A.00.11
    MultiGPU Board                        : No
    Board ID                              : 0x8600
    Board Part Number                     : 900-5G150-1700-000
    GPU Part Number                       : 1E30-875-A1
    FRU Part Number                       : N/A
    Platform Info
        Chassis Serial Number             : N/A
        Slot Number                       : N/A
        Tray Index                        : N/A
        Host ID                           : N/A
        Peer Type                         : N/A
        Module Id                         : 1
        GPU Fabric GUID                   : N/A
    Inforom Version
        Image Version                     : G150.0500.00.03
        OEM Object                        : 1.1
        ECC Object                        : 5.0
        Power Management Object           : N/A
    Inforom BBX Object Flush
        Latest Timestamp                  : N/A
        Latest Duration                   : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GPU C2C Mode                          : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
        vGPU Heterogeneous Mode           : N/A
    GPU Reset Status
        Reset Required                    : Requested functionality has been deprecated
        Drain and Reset Recommended       : Requested functionality has been deprecated
    GPU Recovery Action                   : None
    GSP Firmware Version                  : 570.124.04
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x86
        Device                            : 0x00
        Domain                            : 0x0000
        Base Classcode                    : 0x3
        Sub Classcode                     : 0x0
        Device Id                         : 0x1E3010DE
        Bus Id                            : 00000000:86:00.0
        Sub System Id                     : 0x129E10DE
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 3
                Device Current            : 3
                Device Max                : 3
                Host Max                  : 3
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 300 KB/s
        Rx Throughput                     : 300 KB/s
        Atomic Caps Outbound              : N/A
        Atomic Caps Inbound               : N/A
    Fan Speed                             : 33 %
    Performance State                     : P2
    Clocks Event Reasons
        Idle                              : Not Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Active
        HW Slowdown                       : Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    Sparse Operation Mode                 : N/A
    FB Memory Usage
        Total                             : 46080 MiB
        Reserved                          : 725 MiB
        Used                              : 1 MiB
        Free                              : 45356 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 2 MiB
        Free                              : 254 MiB
    Conf Compute Protected Memory Usage
        Total                             : 0 MiB
        Used                              : 0 MiB
        Free                              : 0 MiB
    Compute Mode                          : Default
    Utilization
        GPU                               : 4 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
        JPEG                              : 0 %
        OFA                               : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    DRAM Encryption Mode
        Current                           : N/A
        Pending                           : N/A
    ECC Mode
        Current                           : Enabled
        Pending                           : Enabled
    ECC Errors
        Volatile
            SRAM Correctable              : 0
            SRAM Uncorrectable            : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
        Aggregate
            SRAM Correctable              : 0
            SRAM Uncorrectable            : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
    Retired Pages
        Single Bit ECC                    : 0
        Double Bit ECC                    : 0
        Pending Page Blacklist            : No
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 45 C
        GPU T.Limit Temp                  : N/A
        GPU Shutdown Temp                 : 94 C
        GPU Slowdown Temp                 : 91 C
        GPU Max Operating Temp            : 89 C
        GPU Target Temperature            : 84 C
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    GPU Power Readings
        Average Power Draw                : N/A
        Instantaneous Power Draw          : 62.58 W
        Current Power Limit               : 260.00 W
        Requested Power Limit             : 260.00 W
        Default Power Limit               : 260.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 260.00 W
    GPU Memory Power Readings 
        Average Power Draw                : N/A
        Instantaneous Power Draw          : N/A
    Module Power Readings
        Average Power Draw                : N/A
        Instantaneous Power Draw          : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    Power Smoothing                       : N/A
    Workload Power Profiles
        Requested Profiles                : N/A
        Enforced Profiles                 : N/A
    Clocks
        Graphics                          : 213 MHz
        SM                                : 213 MHz
        Memory                            : 6500 MHz
        Video                             : 795 MHz
    Applications Clocks
        Graphics                          : 1395 MHz
        Memory                            : 7001 MHz
    Default Applications Clocks
        Graphics                          : 1395 MHz
        Memory                            : 7001 MHz
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 2100 MHz
        SM                                : 2100 MHz
        Memory                            : 7001 MHz
        Video                             : 1950 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : N/A
    Fabric
        State                             : N/A
        Status                            : N/A
        CliqueId                          : N/A
        ClusterUUID                       : N/A
        Health
            Bandwidth                     : N/A
            Route Recovery in progress    : N/A
            Route Unhealthy               : N/A
            Access Timeout Recovery       : N/A
    Processes                             : None
    Capabilities
        EGM                               : disabled

root@giudittai:/zzDevEnv/drivers/nvidia/video/nv_570# 

this instead is the output under stress:

$ nvidia-smi -a

==============NVSMI LOG==============

Timestamp                                 : Fri Mar  7 19:46:26 2025
Driver Version                            : 570.124.04
CUDA Version                              : 12.8

Attached GPUs                             : 1
GPU 00000000:86:00.0
    Product Name                          : Quadro RTX 8000
    Product Brand                         : Quadro RTX
    Product Architecture                  : Turing
    Display Mode                          : Disabled
    Display Active                        : Disabled
    Persistence Mode                      : Disabled
    Addressing Mode                       : None
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1324021077727
    GPU UUID                              : GPU-3c80ca5c-979c-623a-0428-64e2ef631913
    Minor Number                          : 0
    VBIOS Version                         : 90.02.4A.00.11
    MultiGPU Board                        : No
    Board ID                              : 0x8600
    Board Part Number                     : 900-5G150-1700-000
    GPU Part Number                       : 1E30-875-A1
    FRU Part Number                       : N/A
    Platform Info
        Chassis Serial Number             : N/A
        Slot Number                       : N/A
        Tray Index                        : N/A
        Host ID                           : N/A
        Peer Type                         : N/A
        Module Id                         : 1
        GPU Fabric GUID                   : N/A
    Inforom Version
        Image Version                     : G150.0500.00.03
        OEM Object                        : 1.1
        ECC Object                        : 5.0
        Power Management Object           : N/A
    Inforom BBX Object Flush
        Latest Timestamp                  : N/A
        Latest Duration                   : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GPU C2C Mode                          : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
        vGPU Heterogeneous Mode           : N/A
    GPU Reset Status
        Reset Required                    : Requested functionality has been deprecated
        Drain and Reset Recommended       : Requested functionality has been deprecated
    GPU Recovery Action                   : None
    GSP Firmware Version                  : 570.124.04
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x86
        Device                            : 0x00
        Domain                            : 0x0000
        Base Classcode                    : 0x3
        Sub Classcode                     : 0x0
        Device Id                         : 0x1E3010DE
        Bus Id                            : 00000000:86:00.0
        Sub System Id                     : 0x129E10DE
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 3
                Device Current            : 3
                Device Max                : 3
                Host Max                  : 3
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 400 KB/s
        Rx Throughput                     : 350 KB/s
        Atomic Caps Outbound              : N/A
        Atomic Caps Inbound               : N/A
    Fan Speed                             : 33 %
    Performance State                     : P2
    Clocks Event Reasons
        Idle                              : Not Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Active
        HW Slowdown                       : Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    Sparse Operation Mode                 : N/A
    FB Memory Usage
        Total                             : 46080 MiB
        Reserved                          : 725 MiB
        Used                              : 421 MiB
        Free                              : 44935 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 5 MiB
        Free                              : 251 MiB
    Conf Compute Protected Memory Usage
        Total                             : 0 MiB
        Used                              : 0 MiB
        Free                              : 0 MiB
    Compute Mode                          : Default
    Utilization
        GPU                               : 100 %
        Memory                            : 1 %
        Encoder                           : 0 %
        Decoder                           : 0 %
        JPEG                              : 0 %
        OFA                               : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    DRAM Encryption Mode
        Current                           : N/A
        Pending                           : N/A
    ECC Mode
        Current                           : Enabled
        Pending                           : Enabled
    ECC Errors
        Volatile
            SRAM Correctable              : 0
            SRAM Uncorrectable            : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
        Aggregate
            SRAM Correctable              : 0
            SRAM Uncorrectable            : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
    Retired Pages
        Single Bit ECC                    : 0
        Double Bit ECC                    : 0
        Pending Page Blacklist            : No
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 40 C
        GPU T.Limit Temp                  : N/A
        GPU Shutdown Temp                 : 94 C
        GPU Slowdown Temp                 : 91 C
        GPU Max Operating Temp            : 89 C
        GPU Target Temperature            : 84 C
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    GPU Power Readings
        Average Power Draw                : N/A
        Instantaneous Power Draw          : 61.06 W
        Current Power Limit               : 260.00 W
        Requested Power Limit             : 260.00 W
        Default Power Limit               : 260.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 260.00 W
    GPU Memory Power Readings 
        Average Power Draw                : N/A
        Instantaneous Power Draw          : N/A
    Module Power Readings
        Average Power Draw                : N/A
        Instantaneous Power Draw          : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    Power Smoothing                       : N/A
    Workload Power Profiles
        Requested Profiles                : N/A
        Enforced Profiles                 : N/A
    Clocks
        Graphics                          : 75 MHz
        SM                                : 75 MHz
        Memory                            : 6500 MHz
        Video                             : 540 MHz
    Applications Clocks
        Graphics                          : 1395 MHz
        Memory                            : 7001 MHz
    Default Applications Clocks
        Graphics                          : 1395 MHz
        Memory                            : 7001 MHz
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 2100 MHz
        SM                                : 2100 MHz
        Memory                            : 7001 MHz
        Video                             : 1950 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : N/A
    Fabric
        State                             : N/A
        Status                            : N/A
        CliqueId                          : N/A
        ClusterUUID                       : N/A
        Health
            Bandwidth                     : N/A
            Route Recovery in progress    : N/A
            Route Unhealthy               : N/A
            Access Timeout Recovery       : N/A
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 39239
            Type                          : C
            Name                          : ./mixbench-cuda
            Used GPU Memory               : 418 MiB
    Capabilities
        EGM                               : disabled


well… there are a couple of inspiring values, but… my knowledge is extremely limited, so, I would be extremely happy to read something from you. Thank you for any help

My take is that the issue is caused by the motherboard asserting the Power Brake. I would suggest discussing this with the vendor from whom you acquired this machine. Best I know, some vendors use a whitelist mechanism with their system controller: If a PCIe device is not explicitly listed in the database, the PCIe power brake is applied. The Quadro RTX 8000 may not be a “vendor approved” device for this platform.

Did you buy the system with the Quadro RTX 8000 installed, or did you add it yourself later?

This is the proximal cause. The power brake is a signal from the motherboard to the GPU to limit its power. It limits its power consumption by reducing its performance.

The PNY GPU is designed to respect this signal if it is asserted across the PCIE bus. AFAIK it is not part of the formal PCIE bus definition, so it is a “sideband” or “out-of-band” signal.

Anyway its an incompatibility between that particular motherboard and that particular GPU. There is nothing you can do to fix this, although updating the system/motherboard firmware/BIOS to the latest available revision may be worth a try.

This is an example of a reason why it is recommended to purchase systems configured by the OEM, not configured by an end-user in the field. There are other reasons and other incompatibilities that can exist, some are discussed in various forum posts.

Probably not guaranteed or in any way officially or unofficially recommended:

Some people tape over the POWERBRK PCIe pin.

Yes, after I said this:

AFAIK it is not part of the formal PCIE bus definition, so it is a “sideband” or “out-of-band” signal.

I checked, and it does seem that it is a formally defined signal/pin. I’ve scratched out my previous statement.

Thank you a lot to all of you. This explains very well the problem, and it seems that @Curefab suggestion is worth an attempt: using the keyword, a lot of material and suggestions are available, here just a couple of examples:

Just a word about the OEM consideration: this is a low budget project, so, not so much money and in counterparts a lot of time to spend, also in learning through errors. I also would have a couple of stories about OEMs, but… ok, not here.

about the RTX8000 and ML350g10: the machine does support a maximum of two of them (with an “HPE” added in the beginning of the name…).
Interesting, and I cannot be sure, but probably this was not clearly stated in the beginning. It is specified in an update of the docs, and probably some “4 GPUS in one machine” accidents happened, maybe even on OEMs prepared machines:

https://support.hpe.com/hpesc/public/docDisplay?docId=a00101748en_us&docLocale=en_US

Thank you, really, to all of you for your help.

Enrico

The PWRBRK# is active low, but (some) people have success with taping. Probably there is a pull-up in the GPU. Especially if it was introduced in a later PCIe revision.

1 Like