GPU downgraded to x8 lanes

raph38130 · November 17, 2021, 2:38pm

I am using a Dell poweredge r740 server with a Quadro RTX6000 as accelerator. The GPU is downgraded at PCIEe x8. BW is capped at 6GB/s Its PCIe riser is x16 capable…

any idea to solve this issue is welcome

                LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)

rs277 · November 17, 2021, 7:09pm

How was this measured?

The GPU is quite actively power managed and whilst idle, will reduce bus speed, amongst other things. A true test would be to give the GPU some work and while in the process, check the output of nvidia-smi -q.

Look at the GPU Link Info section.

raph38130 · November 18, 2021, 6:56am

Cuda bandwidth test is capped at 6GB/s. On should reach 12GB/s on gen3 x16.

When idle pcie speed might be downgraded. Not pcie width AFAIK

rs277 · November 18, 2021, 7:09pm

nvidia-smi man page states:
Current The current link generation and width. These may be reduced when the GPU is not in use.
although a couple of my cards stay at full width while idle.

Assuming the server is configured as you expect, (there are quite a number of riser card combinations that only run at x8 - https://i.dell.com/sites/csdocuments/Shared-Content_data-Sheets_Documents/en/aa/PowerEdge_R740_R740xd_Technical_Guide.pdf), no other ideas apart from faulty components.

raph38130 · November 18, 2021, 9:11pm

Same faulty result (x8) in a supermicro server… Definetly there is something wrong with this device.
Thx

raph38130 · November 22, 2021, 5:24pm

same issue in a 5950x desktop (downgraded to 8x):

    PCI
        Bus                               : 0x09
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x1E3010DE
        Bus Id                            : 00000000:09:00.0
        Sub System Id                     : 0x12BA1028
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 3
            Link Width
                Max                       : 16x
                Current                   : 8x

is there a tool to perform some sanity check on nvidia GPU ?

 Device 0: Quadro RTX 6000
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(GB/s)
   32000000			6.5

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(GB/s)
   32000000			6.8

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(GB/s)
   32000000			539.6

Result = PASS

rs277 · November 22, 2021, 7:08pm

The only official Nvidia options I’m aware of, apart from nvidia-smi -q, are listed here, but most apply to Teslas (and older ones at that):

https://docs.nvidia.com/deploy/index.html

Topic		Replies	Views
PCI Bandwidth for Hp xw 8600 workstation CUDA Programming and Performance	4	11393	May 27, 2008
Weird bandwidth issues CUDA Programming and Performance	8	1407	December 1, 2016
A6000 PCIe link speed down Gen1 Linux boot	2	506	July 5, 2024
20% of the bandwidth is missing CUDA Programming and Performance	4	1272	August 12, 2014
Terrible host<->device bandwidth seen with bandwidthtest CUDA Programming and Performance	5	841	October 12, 2021
Host to Device Bandwidth Degradation [SOLVED] CUDA Programming and Performance	12	5573	December 11, 2013
very low PCIe bandwidth CUDA Programming and Performance	9	3494	March 2, 2010
Maximum bandwidth with Intel Z68 Chip CUDA Programming and Performance	8	7667	August 16, 2011
Low device-cpu bandwidth for GTX 1080 TI CUDA Programming and Performance	3	1039	November 13, 2019
GTX Titan Win7 x64 gets PCIe 2.0 speed :( CUDA Setup and Installation	8	5724	April 30, 2013

GPU downgraded to x8 lanes

Related topics