We are facing issue with Graphic card drivers and Cuda 11.7

Hi ,

We are using multiple GPUs in our 3 machines
Machine 1- 4xGPU Nvidia Quadro RTX6000 24gb
Machine 2- 4xGPU- MSI RTX 2080 Ti Gaming X trio 11GB DDR66 352-bit Triple Fan
Machine 3- 2xGPU- NVIDIA Tesla P100 GPU

We are using Ubuntu Server 20.04 LTS Legacy version, We downloaded the graphic card drivers which are suggested by Nvidia.
Machine 1- 4xGPU Nvidia Quadro RTX6000 24gb https://us.download.nvidia.com/XFree86/Linux-x86_64/535.104.05/NVIDIA-Linux-x86_64-535.104.05.run

Machine 2- 4xGPU- MSI RTX 2080 Ti Gaming X trio 11GB DDR66 352-bit Triple Fan https://us.download.nvidia.com/XFree86/Linux-x86_64/535.104.05/NVIDIA-Linux-x86_64-535.104.05.run
Machine 3- 2xGPU- NVIDIA Tesla P100 GPU https://us.download.nvidia.com/tesla/515.105.01/NVIDIA-Linux-x86_64-515.105.01.run

after installing the driver when we installed Cuda 11.7, the machine OS got corrupted. if we do not install the graphic driver and directly install Cuda then machines restart themselves and do not show any graphic card. what should we do we wanted to create a Kubernetes cluster.

Please help us

Without any bug-reports it’s difficult to say.

Generally you should use the driver supplied by your distribution, not the .run file installer.
You could also use the graphics ppa Proprietary GPU Drivers : “Graphics Drivers” team.

First uninstall the .run file driver (–uninstall).
Then after installing the distro driver, add the cuda ppa and install cuda-toolkit (not the whole cuda).

Driver 515 is completely outdated. I guess it’s the big corporation / different department thing, that it’s still listed as the current Tesla driver.

1 Like

Thank you very much @Mart Mark your help will be appreciated

@Mart Thank you very much

We want to use this Kubernetes Cluster for Machine learning Cuda Toolkit will be used or Cuda Please guide me