Centos 7 crashes after CUDA 10.1 installation.. PLEASE HELP!!!!!!!!

Hello Techs,
Good day…!

I have GEforce RTX 2080 SUPER Nvidia graphic card which i’m trying to install for the machine learning project. I’ve installed the Drivers successfully but when i try to install CUDA10.1 after the drivers installed, CUDA toolkit gets installed and immediately after it throws the below error when i check nvidia-smi.

“nvidia-smi failed to initialize nvml driver/library version mismatch”.

Before CUDA installation the output of nvidia-smi is below where it detects the GPU properly.

[root@localhost ~]# lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 SUPER] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10f8 (rev a1)
01:00.2 USB controller: NVIDIA Corporation Device 1ad8 (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad9 (rev a1)

[root@localhost ~]# nvidia-smi
Sun Sep 15 19:42:19 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34 Driver Version: 430.34 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Graphics Device Off | 00000000:01:00.0 On | N/A |
| 32% 32C P8 9W / 250W | 137MiB / 7981MiB | 3% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2500 G /usr/bin/X 58MiB |
| 0 3617 G /usr/bin/gnome-shell 70MiB |
| 0 4179 G /usr/lib64/firefox/firefox 3MiB |
| 0 4778 G /usr/lib64/firefox/firefox 3MiB |
±----------------------------------------------------------------------------+

I’ve tried using different version of RHEL 7,7.6,8 all and Centos 7,7.2,7.7 but everywhere the problem seems to be the same and i need to rebuild the machine freshly. Everything works perfect until i install CUDA 10.1 version. I guess it’s installing the cuda drivers as well where it conflicts and kernel gets panic which leads the system to go blank and stops at “Starting Gnome display” starting switch root" and so on but the system doesn’t boot to the GUI mode at all and i need to do a fresh install.

Even i’ve tried to install the older version of CUDA-10.0 but when i do yum install CUDA, it picks up the latest CUDA-10.1 and re-installs everything and the System reboot and crash. This is been happening since long time and there is a request raised 5days back also but there is no response received from the team.

https://nvidia.custhelp.com/app/account/questions/detail/i_id/1818157

Have followed the procedure explained below but no luck.
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#redhat-installation

Need your support to get it working as this project is very important for my career.we are in the process of purchasing more min 20 Cards for our ML project setup but i need to show the complete setup first to proceed further.

Your time and support is highly appreciate.
Awaiting your quick circle back.

Thank You!
Regards.

By installing the full cuda bundle you’re installing an older, incompatible driver over your already working one. Uninstall all cuda/nvidia packages to have a clean slate, then install the driver from a repo like rpmfusion. To install the cuda toolkit, follow the docs you linked to in your post but don’t do the last step (sudo yum install cuda). Instead run

sudo yum install cuda-toolkit-10-1

to install just the toolkit and not the bundled driver.

Hello Team,
Thanks for the reply… Sure I will give it a try with a fresh install as my Centos has already been crashed probably as you said, i had predicted it but was unsure about it. Let me rebuild the server and follow the steps you’ve mentioned and circle back to you.

Thanks in advance for all your time and support… Appreciate it.!