Recommended drivers install process for Dell R740 Poweredge Server with a Nvidia Tesla P100 GPU with only SSH access

Hi

I have only a ssh access to a Dell R740 Poweredge Server with a Nvidia Tesla P100 GPU and Ubuntu 18.04.

I wanted to know what was the recommended process to install mostly the drivers and toolkits necessary for Tensorflow without the risk of a login loop or a black screen. (It already happened once with the one from the Ubuntu repo, ssh was not responding, the IT department reinstalled everything so no nvidia-bug-report available. And since I do not have direct access it is not an easy problem to solve without asking IT to do the whole process again).

I have seen some people having difficulties like in here : https://devtalk.nvidia.com/default/topic/1046157/linux/ubuntu-16-04-gui-login-loop-after-installing-nvidia-driver/1

Thank you to anyone kind enough and taking the time to help me,

It might be questionable why there’s a gui running on a probably headless server and why installing a graphics driver keeps sshd from starting but to have a least intrusive install,

  • download the 418.56 .run installer from here: https://http.download.nvidia.com/XFree86/Linux-x86_64/
  • run it with options --dkms --no-opengl-files --no-x-check -Z
  • reboot
  • download cuda 10.0 .deb
  • do first three steps of install instructions
  • don’t install cuda
  • instead, run sudo apt install cuda-toolkit-10-0

Thank you very much.
When you say “do first three steps of install instructions” where can I find this instruction, do you mean the first three items of your instructions or another entirely ?

Sorry, forgot three words:

  • do first three steps of install instructions on download page

Addendum: according to other user’s problems, Tensorflow seems to be linked against cuda 10.0 so it doesn’t work with cuda 10.1. Download the 10.0 .deb (network) from archives:
https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=debnetwork

Thank you very much for your time, I will try that as soon as I can.

Thank you very much it worked for me but I had to remove the nouveau driver and purge them first