NVIDIA dual display issue with RHEL 7.6

We are having a system Intel Xeon D and a NVIDIA GPGPU ( P5000 or RTX5000). There is a another on board GPU based on SM768. We want to have a display output from both GPGPU and GPU. BIOS has been modified accordingly by the BIOS vendor.
After NVIDIA driver installation , we are not able to see any display output from GPGPU. Only output from GPU module we are able to get. We are currently using RHEL 7.6 . When i run the nvidia-smi command its throwing ’ no display device found’, even though driver installation was successful. The nvidia bug report is attached. Please let me know if any additional drivers need to be installed or any other additional steps to be followed.
P.S: The nouveau driver has been added to blocklist.
nvidia-bug-reportlog.gz (115.8 KB)

What kind of a device is this? A Laptop?
The GPU is a Quadro P5000 mobile. What kind of mainboard supports Intel Xeon D, the SM768 which is a typical Docking Station Display chip AND a mobile GPU? Interesting.

For one thing you seem to have a rather broken setup. The timestamps are from December1999 and they repeat, making it difficult to put messages in context.
You have/had a very old NVIDIA driver installed (450.66) which was most likely not cleaned up correctly before you installed the 525.116.04 driver. This is also reflected by the Xorg log files throwing errors of no detected displays only for the older driver:

[    22.385] (II) NVIDIA dlloader X Driver  450.66  Wed Aug 12 19:44:12 UTC 2020
[    22.385] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[    22.385] (++) using VT number 1

[    22.388] (EE) No devices detected.
[    22.389] (EE) 
Fatal server error:
[    22.389] (EE) no screens found(EE) 
[    22.389] (EE)

Last but not least the GPU has “fallen off the bus”

Dec 31 19:05:13 localhost.localdomain kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  525.116.04  Thu Apr 27 17:56:37 UTC 2023
Dec 31 19:05:13 localhost.localdomain kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  525.116.04  Thu Apr 27 17:57:02 UTC 2023
Dec 31 19:05:13 localhost.localdomain kernel: [drm] [nvidia-drm] [GPU ID 0x00000500] Loading driver
Dec 31 19:05:13 localhost.localdomain kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:05:00.0 on minor 0
Dec 31 19:05:27 localhost.localdomain kernel: NVRM: GPU at PCI:0000:05:00: GPU-1eaacfc4-af32-6b39-4050-21ae9f7ef7e0
Dec 31 19:05:27 localhost.localdomain kernel: NVRM: Xid (PCI:0000:05:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Dec 31 19:05:27 localhost.localdomain kernel: NVRM: GPU 0000:05:00.0: GPU has fallen off the bus.

Which can be caused by many different issues, for example insufficient power to the GPU, bad seating in the PCIe slot or in this case whatever the socket is in your system, defective PCIe bus or GPU, a BIOS version that does not correctly support this PCIe device, high GPU temperatures, …

My suggestion would be to first verify that all hardware is installed correctly and guaranteed to work properly together, that there is ample power supplied and that cooling is sufficient for the GPU and other components.
After that do a complete fresh re-install of both the OS and the latest supported NVIDIA driver closely following the instruction in the README, if you cannot install it through the distro.

Hi Markus.
This is a rugged server 3U VPX BASED where we have intel Xeon D 1559 as main processor and installed two GPUs
1)Nvidia P5000
2)SM768

SM768 as primary display and Nvidia P5000 for high end computing. Primary display is working fine until we install Nvidia drivers. After installation the screen is going blank. Neither SM768 display nor Nvidia P5000 display is coming out.

We have installed latest nvidia drivers 525.116.04. But the system behaviour is same as before. No display out from both the GPUs.

We want displays from both the GPU or atleast from one GPU after installation of Nvidia drivers( like display and cuda)

Hi again,

I understand what you want to achieve. What I am saying is that your Hardware setup might be flawed.

For one thing you have the Mobile version of the Quadro P5000. I don’t know how you “installed” it, but usually this is an ISV or OEM product, in which case you should contact your service provider.
Otherwise, as stated above, you might have a defective GPU and should consider doing an RMA.

/sbin/lspci -d "10de:*" -v -xxx

05:00.0 VGA compatible controller: NVIDIA Corporation GP104GLM [Quadro P5000 Mobile] (rev a1) (prog-if 00 [VGA controller])

Beside that, I cannot add anything to what I said in my previous post.