Hello Techs,
Good day…!
I have GEforce RTX 2080 SUPER Nvidia graphic card which i’m trying to install for the machine learning project. I’ve installed the Drivers successfully but when i try to install CUDA10.1 after the drivers installed, CUDA toolkit gets installed and immediately after it throws the below error when i check nvidia-smi.
“nvidia-smi failed to initialize nvml driver/library version mismatch”.
Before CUDA installation the output of nvidia-smi is below where it detects the GPU properly.
[root@localhost ~]# lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 SUPER] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10f8 (rev a1)
01:00.2 USB controller: NVIDIA Corporation Device 1ad8 (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad9 (rev a1)
[root@localhost ~]# nvidia-smi
Sun Sep 15 19:42:19 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34 Driver Version: 430.34 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Graphics Device Off | 00000000:01:00.0 On | N/A |
| 32% 32C P8 9W / 250W | 137MiB / 7981MiB | 3% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2500 G /usr/bin/X 58MiB |
| 0 3617 G /usr/bin/gnome-shell 70MiB |
| 0 4179 G /usr/lib64/firefox/firefox 3MiB |
| 0 4778 G /usr/lib64/firefox/firefox 3MiB |
±----------------------------------------------------------------------------+
I’ve tried using different version of RHEL 7,7.6,8 all and Centos 7,7.2,7.7 but everywhere the problem seems to be the same and i need to rebuild the machine freshly. Everything works perfect until i install CUDA 10.1 version. I guess it’s installing the cuda drivers as well where it conflicts and kernel gets panic which leads the system to go blank and stops at “Starting Gnome display” starting switch root" and so on but the system doesn’t boot to the GUI mode at all and i need to do a fresh install.
Even i’ve tried to install the older version of CUDA-10.0 but when i do yum install CUDA, it picks up the latest CUDA-10.1 and re-installs everything and the System reboot and crash. This is been happening since long time and there is a request raised 5days back also but there is no response received from the team.
https://nvidia.custhelp.com/app/account/questions/detail/i_id/1818157
Have followed the procedure explained below but no luck.
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#redhat-installation
Need your support to get it working as this project is very important for my career.we are in the process of purchasing more min 20 Cards for our ML project setup but i need to show the complete setup first to proceed further.
Your time and support is highly appreciate.
Awaiting your quick circle back.
Thank You!
Regards.