I am using a Dell poweredge r740 server with a Quadro RTX6000 as accelerator. The GPU is downgraded at PCIEe x8. BW is capped at 6GB/s Its PCIe riser is x16 capable…
The GPU is quite actively power managed and whilst idle, will reduce bus speed, amongst other things. A true test would be to give the GPU some work and while in the process, check the output of nvidia-smi -q.
nvidia-smi man page states: Current The current link generation and width. These may be reduced when the GPU is not in use.
although a couple of my cards stay at full width while idle.
PCI
Bus : 0x09
Device : 0x00
Domain : 0x0000
Device Id : 0x1E3010DE
Bus Id : 00000000:09:00.0
Sub System Id : 0x12BA1028
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 8x
is there a tool to perform some sanity check on nvidia GPU ?
Device 0: Quadro RTX 6000
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 6.5
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 6.8
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 539.6
Result = PASS