not able to update Tesla P100 driver 384 to 418

Thanks a million!!! You have been way more helpful than the RHEL support. The server finally booted up into GUI and I was able to login to the GUI on the server but not through the remote Dell iDRAC.

I do want to know that when I type nvidia-smi, I see the Nvidia 430 driver and 10.1 cuda but should there be something displaying in the processes section? I see ‘no processes found’ but I remember there was something before. Do I need to add/install anything more?

nvidia-smi
Tue Aug 20 13:42:53 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40 Driver Version: 430.40 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE… Off | 00000000:03:00.0 Off | 0 |
| N/A 36C P0 25W / 250W | 4MiB / 16280MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

If the xorg.conf got deleted and the monitor is connected to the onboard matrox server graphics, the xserver might now run on that. Connect the monitor to the nvidia outputs and use a minimal xorg.conf like

Section "Device"
    Identifier     "nvidia"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:3:0:0"
    Option         "AllowEmptyInitialConfiguration"
EndSection

Otherwise, please create a new nvidia-bug-report.log

I don’t see xorg.conf in X11 directory but it is under the xrdp directory.

ls -la /etc/X11/
total 36
drwxr-xr-x. 7 root root 265 Aug 20 09:25 .
drwxr-xr-x. 173 root root 12288 Aug 20 10:03 …
drwxr-xr-x. 2 root root 6 Dec 14 2017 applnk
drwxr-xr-x. 2 root root 111 Aug 20 09:31 fontpath.d
drwxr-xr-x. 5 root root 142 Aug 20 09:25 xinit
-rw-r–r–. 1 root root 547 Aug 25 2017 Xmodmap
-rw-r–r–. 1 root root 1136 Jan 12 2018 xorg.conf.backup
drwxr-xr-x. 2 root root 30 Jul 15 07:11 xorg.conf.d
-rw-r–r–. 1 root root 1376 Jan 12 2018 xorg.conf.nvidia_uninstalled
-rw-r–r–. 1 root root 0 Dec 15 2017 xorg.conf.nvidia-xconfig-original
-rw-r–r–. 1 root root 111 Dec 15 2017 xorg.conf.xorg-x11-drv-nvidia_uninstalled
drwxr-xr-x. 2 root root 23 Aug 20 09:27 xrdp
-rw-r–r–. 1 root root 493 Aug 25 2017 Xresources

more /etc/X11/xrdp/xorg.conf

Section “ServerLayout”
Identifier “X11 Server”
Screen “Screen (xrdpdev)”
InputDevice “xrdpMouse” “CorePointer”
InputDevice “xrdpKeyboard” “CoreKeyboard”
EndSection

Section “ServerFlags”
Option “DontVTSwitch” “on”
Option “AutoAddDevices” “off”
EndSection

Section “Module”
Load “dbe”
Load “ddc”
Load “extmod”
Load “glx”
Load “int10”
Load “record”
Load “vbe”
Load “xorgxrdp”
Load “fb”
EndSection

Section “InputDevice”
Identifier “xrdpKeyboard”
Driver “xrdpkeyb”
EndSection

Section “InputDevice”
Identifier “xrdpMouse”
Driver “xrdpmouse”
EndSection

Section “Monitor”
Identifier “Monitor”
Option “DPMS”
HorizSync 30-80
VertRefresh 60-75
ModeLine “1920x1080” 138.500 1920 1968 2000 2080 1080 1083 1088 1111 +hsync -vsync
ModeLine “1280x720” 74.25 1280 1720 1760 1980 720 725 730 750 +HSync +VSync
Modeline “1368x768” 72.25 1368 1416 1448 1528 768 771 781 790 +hsync -vsync
Modeline “1600x900” 119.00 1600 1696 1864 2128 900 901 904 932 -hsync +vsync
EndSection

Section “Device”
Identifier “Video Card (xrdpdev)”
Driver “xrdpdev”
EndSection

Section “Screen”
Identifier “Screen (xrdpdev)”
Device “Video Card (xrdpdev)”
Monitor “Monitor”
DefaultDepth 24
SubSection “Display”
Depth 24
Modes “640x480” “800x600” “1024x768” “1280x720” “1280x1024” “1600x900” “1920x1080”
EndSubSection
EndSection

I have attached the bug report for your review. Thanks!
nvidia-bug-report.log.gz (1.24 MB)

The xorg.conf got removed during the update so there’s now the Xserver running on the Matrox. I don’t know about your use case for the nvidia gpu, sounds like you’re connecting only over xrdp, which is a virtual Xserver? If you want/need an Xserver running on the nvidia, just use the /etc/X11/xorg.conf from post #22.

I created /etc/X11/xorg.conf as mentioned in post #22 andthe entire gui was gone. I removed it and the GUI came back. I will check with Dell to see if they can help anyway because I cannot even access the boot options when I restart the server. I will provide an update once I hear from them. Thank you very much.

Does that system have a monitor connected?
Is that monitor connected to the nvidia card and not to the on-board vga connector?

The server is connected through KVM to a vga output and it has always been that way. The KVM doesn’t have the option to connect to nvidia card. The issue is that I was able to access BIOS before through iDRAC but now I’m not able to do that either. Dell tech support engineer suggested to reseat the nvram and we plan to do that and see if it works. Appreciate your help very much, I will keep you posted. Thanks!

I have scheduled the downtime to reseat nvram for tomorrow morning. I think the primary card is coming up as Intel and not Nvidia. I installed Mayavi for the users and when I launch Mayavi, we get the following message.

ERROR: In /work/standalone-x64-build/VTK-source/Rendering/OpenGL2/vtkOpenGLRenderWindow.cxx, line 797
vtkXOpenGLRenderWindow (0x55e1822a25b0): GL version 2.1 with the gpu_shader4 extension is not supported by your graphics driver but is required for the new OpenGL rendering backend. Please update your OpenGL driver. If you are using Mesa please make sure you have version 10.6.5 or later and make sure your driver in Mesa supports OpenGL 3.2.

ERROR: In /work/standalone-x64-build/VTK-source/Rendering/OpenGL2/vtkShaderProgram.cxx, line 445
vtkShaderProgram (0x55e17d111710): 1: #version 120

I will post the out put of glxinfo in the morning once I have physical access to the server. Thanks!

With your current setup, I suspect this is expected. Let me put this straight:
Local access:
the monitor is connected to the onboard vga connector which is driven by a “Matrox mga200 server graphics”. This is a simple framebuffer device which only uses mesa software rendering. I suspect RHEL 7 uses Mesa 18 which only reports OpenGL 3.1 for the software renderer. This can be overridden but that’s another story.
Since this simple Matrox device, or better said its driver, doesn’t support output redirection, the nvidia card is not used for graphics in the current setup. It could be used for Cuda but taking a glance at Mayavi, it doesn’t seem to support this. Which leaves me a bit puzzled, why is there an nvidia card in that box if it’s not used anyway? Is that box only used locally or is there a remote graphics use case, e.g. over xrdp+tigervnc+virtualgl or NoMachine?

The iDRAC access is now fixed and I can control the server remotely. You are correct, the server is extensively used over xrdp and they set the session to xorg. When they had RDP issues they connected using SSH tunneling with X11 forwarding. The users wanted a good graphics card as they use Matlab for their analysis and my Dell rep suggested to go with Nvidia. Which additional drivers do I have to install to get rid of these errors? Thanks!

locally: connect the monitor to the nvidia card and use the xorg.conf I posted.
remotely: do the same, then install and configure virtualgl and use ‘vglrun ’ inside the xrdp session.

Excuse my delay… I went to connect the monitor to nvidia card but there is no graphical output for the Tesla P100 card. I reached out to the vendor, Dell in my case, and this is what they said:

“The Tesla P100 GPGPU (General Processing GPU) does not have a graphics port. These cards tend to not have direct attach graphics unlike their consumer counter parts (GTX series cards). These cards are compute units that are optimized for data/calculation processing.”

That is why I think when I used xorg.conf you suggested the server didn’t boot into GUI. Once I removed it, the server came backup in graphical mode. VGA works fine so I want to ask - should I install and configure virtualgl and use ‘vglrun ’ inside the xrdp session? Thank you.

Sorry, my bad, Teslas of course have no outputs. Regardless of the thread title I somehow thought you were running a Quadro.
To get HW accelerated graphics from the tesla, you will have to run a second Xserver on it and configure virtualgl to use that. Then you can use vglrun locally and remote to get hw accel.
Just as a note: if your remote users are using forwarded X11 over ssh, this wont work since then indirect GL is used and rendered on the client machine. So this will only work over rdp/vnc.

Actually they are not using X11 forwarding just xrdp. They used X11 forwarding when the GUI was not showing up and you helped me get that fixed. Not sure why this is a issue on the server but when we installed Mayavi on laptop it opens up fine. Thanks!

I spoke to the hardware vendor(Dell) support and they suggested to downgrade the version from 430 to 418 which is recommended for my P100 driver.

He also suggested I follow the instructions in the following documentation and purchase perpetual license to use the NVIDIA card for GL rendering.

I wanted to get your opinion before proceeding. What do you suggest?

Thanks!

IMHO, you should stay on the latest stable driver (430) unless you have problems with it and want support from Dell. In that case, they’ll probably ask you to downgrade.
vGPU is for providing virtual gpus to virtual machines, i.e. that every user can have its own virtual workstation. I don’t know about your exact use cases/number of users/etc. but I suspect this would be a bit overblown since this also requires you to set up VMs for each user. If you’re looking for a commercial product, maybe take a look at NoMachine which has also support for virtualgl to make use of the tesla.
Neither vGPU nor NoMachine are click-and-run solutions, though.

The Redhat support engineer said the same that 430 is the current driver. I have attached my conversation from 8/12 where he said the same thing that you mentioned. I spoke to my Dell sales guy and he said that graphics rendering should work if users are connecting to the server locally and the end users are connecting using RDP.

What is driving me nuts is that end users are claiming mayavi rendered graphics with python2.7 but now that I upgraded it mayavi2 and python3.7 the OpenGL is not rendering. These end users updated some drivers and messed up the entire gui connectivity. Redhat didn’t help much and that’s when I posted my question on this forum and you helped me get the gui working again.
RHSupport8-12.txt (3.02 KB)

I guess that by upgrading mayavi/vtk this raised the requirement for the opengl level, which the mesa software renderer doesn’t provide anymore. Like mentioned before, you can use overrides to (probably) make it run, use
MESA_GL_VERSION_OVERRIDE=3.3 mayavi2
to make it run using software gl.
The Dell rep’s claim to just put in a tesla and it’ll magically work is plain wrong. Especially xrdp will always need a virtualgl setup.
The RH support’s claims are more on-spot but outdated. It’s not that complicated either.
The question is why you need a gui locally, this is the only thing that’s complicating things as it’ll always run on matrox.
I don’t know about the gl lib layout of RHEL7, please post the output of
ls -l /usr/lib/libGL* /usr/lib64/libGL*

You are correct about upgrading mayavi/vtk raised the requirement opengl level and Mesa has deprecate support for Matrox. This command MESA_GL_VERSION_OVERRIDE=3.3 mayavi2 did open Mayavi fine. I’m waiting on the end user to confirm that they can plot.

Here is the output and thank you very much.

ls -l /usr/lib/libGL* /usr/lib64/libGL*
ls: cannot access /usr/lib/libGL*: No such file or directory
lrwxrwxrwx. 1 root root 22 Jul 11 14:15 /usr/lib64/libGLdispatch.so → libGLdispatch.so.0.0.0
lrwxrwxrwx. 1 root root 22 Jul 11 14:14 /usr/lib64/libGLdispatch.so.0 → libGLdispatch.so.0.0.0
-rwxr-xr-x. 1 root root 640944 Jul 23 2018 /usr/lib64/libGLdispatch.so.0.0.0
lrwxrwxrwx. 1 root root 29 Aug 20 09:39 /usr/lib64/libGLESv1_CM_nvidia.so.1 → libGLESv1_CM_nvidia.so.430.40
-rwxr-xr-x. 1 root root 61136 Jul 21 04:58 /usr/lib64/libGLESv1_CM_nvidia.so.430.40
lrwxrwxrwx. 1 root root 21 Jul 11 14:15 /usr/lib64/libGLESv1_CM.so → libGLESv1_CM.so.1.2.0
lrwxrwxrwx. 1 root root 21 Jul 11 14:14 /usr/lib64/libGLESv1_CM.so.1 → libGLESv1_CM.so.1.2.0
-rwxr-xr-x. 1 root root 45848 Jul 23 2018 /usr/lib64/libGLESv1_CM.so.1.2.0
lrwxrwxrwx. 1 root root 26 Aug 20 09:39 /usr/lib64/libGLESv2_nvidia.so.2 → libGLESv2_nvidia.so.430.40
-rwxr-xr-x. 1 root root 110904 Jul 21 04:58 /usr/lib64/libGLESv2_nvidia.so.430.40
lrwxrwxrwx. 1 root root 18 Jul 11 14:15 /usr/lib64/libGLESv2.so → libGLESv2.so.2.1.0
lrwxrwxrwx. 1 root root 18 Jul 11 14:14 /usr/lib64/libGLESv2.so.2 → libGLESv2.so.2.1.0
-rwxr-xr-x. 1 root root 75312 Jul 23 2018 /usr/lib64/libGLESv2.so.2.1.0
lrwxrwxrwx. 1 root root 14 Jul 11 14:15 /usr/lib64/libGL.so → libGL.so.1.7.0
lrwxrwxrwx. 1 root root 14 Jul 11 14:14 /usr/lib64/libGL.so.1 → libGL.so.1.7.0
-rwxr-xr-x. 1 root root 582264 Jul 23 2018 /usr/lib64/libGL.so.1.7.0
lrwxrwxrwx. 1 root root 15 Dec 5 2017 /usr/lib64/libGLU.so → libGLU.so.1.3.1
lrwxrwxrwx. 1 root root 15 Nov 29 2017 /usr/lib64/libGLU.so.1 → libGLU.so.1.3.1
-rwxr-xr-x. 1 root root 524464 Jan 26 2014 /usr/lib64/libGLU.so.1.3.1
lrwxrwxrwx. 1 root root 20 Aug 20 09:27 /usr/lib64/libGLX_mesa.so.0 → libGLX_mesa.so.0.0.0
-rwxr-xr-x. 1 root root 502224 Apr 3 19:10 /usr/lib64/libGLX_mesa.so.0.0.0
lrwxrwxrwx. 1 root root 23 Aug 20 09:39 /usr/lib64/libGLX_nvidia.so.0 → libGLX_nvidia.so.430.40
-rwxr-xr-x. 1 root root 1142976 Jul 21 04:56 /usr/lib64/libGLX_nvidia.so.430.40
lrwxrwxrwx. 1 root root 15 Jul 11 14:15 /usr/lib64/libGLX.so → libGLX.so.0.0.0
lrwxrwxrwx. 1 root root 15 Jul 11 14:14 /usr/lib64/libGLX.so.0 → libGLX.so.0.0.0
-rwxr-xr-x. 1 root root 75040 Jul 23 2018 /usr/lib64/libGLX.so.0.0.0
lrwxrwxrwx. 1 root root 27 Aug 20 09:27 /usr/lib64/libGLX_system.so.0 → /usr/lib64/libGLX_mesa.so.0

To get the nvidia gpu to render through virtualgl, try this:
Use this as /etc/X11/xorg.conf

Section "Device"
    Identifier     "nvidia"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:3:0:0"
    Option         "AllowEmptyInitialConfiguration"
EndSection

and reboot. Afterwards, you’ll have no gui on the local monitor but it’s there, running in the nvidia video memory using a virtual monitor. Please check if the X server is running:
ps aux |grep X
and inspect /var/log/Xorg.0.log
Please post/attach both.
Check if you can still connect over xrdp.
If this is running, install and configure virtualgl. It’s in the epel repo you have already added, so just use yum to install.
Don’t know if the config is started automatically, if not, run
vglserver_config
just use the default values, then restart the display-manager or reboot.
Put your users into the vglusers group, connect over xrdp, open a terminal and run
glxgears
stop it, then run
vglrun glxgears
this should yield much higher fps.

to get some on-demand gui on the local monitor, create /usr/local/etc/xorg-matrox.conf

Section "Device"
    Identifier     "matrox"
    Driver         "modesetting"
    BusID          "PCI:10:0:0"
EndSection

then log in as user on text console and run
startx – vt8 :8 -config /usr/local/etc/xorg-matrox.conf