Installing Cuda for K80 without/with Display GPU

Hello,

My config is :

Intel Xeon processors

4 x Nvidia K80

1 x Nvidia GT610 (Though i have unplugged it)

OS : Ubuntu 16.04

So I’m trying to do now is install K80 without any display GPU.
I’ve followed the below steps as suggested by the installation guide:

  1. Blacklisted nouveau
  2. Downloaded latest nvidia K80 driver (.deb file) [url]Tesla Driver for Ubuntu 16.04 | 384.81 | Linux 64-bit Ubuntu 16.04 | NVIDIA
  3. Installed the driver
  4. Rebooted

Now after rebooting the login loop happens. Meaning the xserver is kinda broken.

So what i did was

I installed Cuda 9.0 drivers and samples in ttyl mode. The installation was pretty smooth.
And nvidia-smi shows all the k80 gpus.

But is it possible to fix the xserver ?

I should mention i have followed the following things to reinstall xserver. it didn’t work

→ 1st method : ‘sudo dpkg-reconfigure xserver-xorg’
or
→ 2nd method : ‘sudo apt-get install --reinstall xserver-xorg’

Both of this failed.

Moreover the “xorg.conf” file is missing from ‘/etc/X11’

Now login loop still persists. But I can access everything in ttyl.

One more thing, i have a spare gpu (Nvidia gt610) which i assume i can use as display gpu.

So can anyone give me a workaround to use it so that it doesn’t conflict with the K80 Nvidia driver ?

Since you installed via deb, you probably could have restored the x-server operation simply by uninstalling. Now that you’ve done those other things, I’m not sure. You could still try it.

If you were to start over (e.g. if an uninstall restores your x-display, or if you reinstall the OS), the correct thing to do to avoid disrupting the x-server (presumably running on some other built-in graphics on the server) is to reinstall CUDA and the GPU driver with runfile installers (not deb) but specify the no-opengl-files command line option (or no-opengl-libs if you are using a driver installer).

This is documented with full instructions elsewhere on this forum if you do a bit of searching (just search on login loop). for example here:

[url][Solved] Titan X for CUDA 7.5 login-loop error [Ubuntu 14.04] - CUDA Setup and Installation - NVIDIA Developer Forums

If you want to use the gt610, be aware that it is a fermi device and not directly supported by CUDA 9. However you can still probably install CUDA 9 and use it on the K80s, because the 384.xx driver branch for CUDA 9 still supports Fermi GPUs. It won’t show up or won’t be usable in CUDA, however. Probably you don’t care about that.

In that case, the approach I would recommend is to install only the GT610, and then install an appropriate R384 driver for it, using a runfile installer from [url]Official Drivers | NVIDIA (384.90 appears to be latest there, currently). During the driver runfile install, select “yes” when prompted about OpenGL libs/files and “yes” if asked to modify your xorg.conf. After a reboot, your main display should switch to the GT610. Thereafter install the K80s and reinstall CUDA 9 from runfile installer (not deb) but deselect the option to install the driver. Your R384 driver already installed should work with CUDA 9.

Thanks for the answer.

I’ll definitely try out both the instructions.

Cheers !