NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver

Running System: Ubuntu 18.04 LTS (bionic beaver)
Hardware: GeForce RTX 2080Ti

Trying to run command: nvidia-smi

Returns with:

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

It was previously working, but when I rebooted my pc one day, my dual monitor stopped working and nvidia-smi failed to output correctly.

I ran:

lspci | grep -i nvidia

02:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
02:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
02:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
02:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)

Seems to recognize the devices.

I looked around the posts and did the following

  1. Erase and do o a clean install of nvidia driver ( purge → autoremove → reinstall: tried both autoinstall and specific driver nvidia-driver-450. The CUDA version I desire is 10.1

  2. Disable SecureBoot (it was already disabled)

  3. Reselect Nvidia option with Query? (the command that is meant to display nvidia and intel by default) Again, here I didn’t have intel available on the list, but did input the switching command manually to switch to intol and back to nvidia.

  4. Check for any blacklist files… which there weren’t any.

At some point, when I repeated process 1, I managed to get nvidia-smi working on nvidia-driver-460, but when I installed CUDA 10.1 on it, the driver seemed to crash again, outputting the original “couldn’t communicate msg”.

Above is what my software and updates look like now… No detected graphics card…

I tried to plug out and plug back in the graphics card physically, and have repeated the above process… It seemed to crash again…

I’m not sure what to do… please help :'(

Installing full cuda installs an incompatible driver. Purge anything nvidia and cuda, reinstall the driver using software&updates, then install ‘cuda-toolkit’ instead of ‘cuda’.

I ran the following code to uninstall all instances of cuda and nvidia drivers

sudo apt-get remove nvidia*
sudo apt-get purge nvidia*
sudo apt-get remove cuda*
sudo apt-get purge cuda*
sudo apt-get autoremove

then I rebooted.

From the Software and Updates > additional drivers icon, I then selected Nvidia driver metapackage from nvidia-driver-460 (I didn’t see any ‘tested’ mark on any drivers).

After doing so, I rebooted to check whether nvidia-smi works, it still doesn’t seem to work ( I can’t even get to CUDA installation stage since nvidia-smi doesn’t work.

What should I do??? :'(

I am attaching a copy of ‘nvidia-bug-report’ and the result of " sudo journalctl -b0 _COMM=gdm-x-session --no-pager >journal.txt" just in case I am missing anything
nvidia-bug-report.log.gz (145.3 KB) journal.txt (108.6 KB)

You have set your system gcc to 6.5 but 7.5 is needed to compile the driver. Please set the standard gcc and cc to gcc-7.5 using update-alternatives.