[Solved] Titan X for CUDA 7.5 login-loop error [Ubuntu 14.04]

I have a similar error to the thread described here:
https://devtalk.nvidia.com/default/topic/872967/?comment=4656571

I do however, have a perfect install, and I run all the post-install tests and everything looks fine, up until I reboot. I’m faced with the login-loop, and from what I read it seems its conflicting with the Xorg system. I banned the nouveau display system (and did a $update-initramfs -u) post-install, and rebooted, and still have the same problem. I’m not really sure what’s going on. Do I have to reinstall a program? I’ve already had to do a fresh install on my system and since this is a new computer cluster, it’s imperative to have CUDA working, but only use the Titan X for GPU processing and not for rendering (rendering will be done by onboard integrated graphics card on motherboard).

Also: $nvidia-smi works fine, but doesn’t show any processes running.
But: $nvidia-settings shows an error message:

ERROR: The control display is undefined; please run ‘nvidia-settings --help’ for usage information.

In this thread I also found that it was a good idea to put ‘no’ in the install OpenGL part of the CUDA driver install. https://devtalk.nvidia.com/default/topic/864024/cuda-setup-and-installation/howto-ubuntu-14-04-intel-igpu-for-display-and-gtx-960-for-cuda/ . Perhaps that is my problem and I should do a new install like in the first link I posted? Any other suggestions? I now recall that by installing CUDA, it asked me if I wanted to install the Nvidia driver too, I believe I said yes. Maybe this was the mistake?

Other questions:

  1. If I am only installing CUDA for parallel processing with my Titan X, am I still supposed to blacklist the nouveau driver? (It seems as if that is only for display, but I would need it if I’m using an integrated graphics card for that)
  2. Do I still need to install an Nvidia driver if I am not going to use the drive for visualization? Only installing the CUDA drive makes more sense to me.

I used this guide to perform my install:
https://www.quantstart.com/articles/Installing-Nvidia-CUDA-on-Ubuntu-14-04-for-Linux-GPU-Computing

Other specs & details:
Linux Kernel version: 3.19.0-25-generic 64-bit machine.
I also don’t happen to have an xorg.conf file
$sudo apt-get install ubuntu-desktop ; outputs ‘ubuntu-desktop is already the newest version’.
$nvcc -V ; seemed to work perfect after install, but it stopped working after reboot.
I login to my system through Ctrl + Alt + F1

Yes, the installation of the openGL libs by the driver is affecting your GUI and login.

  1. Do a new, clean install, without the NVIDIA GPU installed
  2. Establish an xorg.conf file so that the X server does not use display autodetection (please google for how to do this based on your linux distro, it’s not an NVIDIA issue). Get everything working the way you want, GUI-wise.
  3. Remove the nouveau driver. The method is given in the CUDA linux getting started guide.
  4. Power down, install the NVIDIA GPU.
  5. Power up again, make sure nothing changes regarding login and GUI through reboots.
  6. Download and install the appropriate driver runfile installer for your GPU. You get this by downloading it from www.nvidia.com. During the install of the runfile, select the command line option to not install OpenGL libs. If you’re not sure what that is, use --help on the runfile install command line to see the command line options, then re-run with the command line option to disable installation of the OpenGL libs. Select “no” if prompted to modify your xorg.conf file.
  7. At this point you should be able to reboot and login. If so, you are past the trouble.
  8. Now download the CUDA runfile installer (do not use package install method), but answer “no” when asked if you want to install the driver. Follow the other directions in the cuda linux getting started guide:

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#abstract

Will the above install still work if I don’t remove my NVIDIA graphics card? (I think this is what you mean when you say “Without the NVIDIA GPU installed”.) This is an expensive computer and I don’t want to accidentally shortcircuit the Titan X or any other component by bringing the computer apart.

It should, yes, assuming that you create a proper xorg.conf and the X autodetect feature doesn’t trip you up.

Ok, I’m about do start from scratch by uninstalling CUDA and deleting the nouveau ban, but in the manual you sent me it says:
$ sudo /usr/local/cuda-7.5/bin/uninstall_cuda_7.5.pl
to uninstall, but such file doesn’t exist! (The /usr/local/cuda-7.5/bin/ folder does exist though).

I used a .deb file for installation.

Nevermind, I did this for clean nvidia uninstall:
$sudo apt-get remove --purge nvidia-
&& apt-get autoremove
$sudo apt-get install ubuntu-desktop
$sudo rm /etc/X11/xorg.conf
$echo ‘nouveau’ | sudo tee -a /etc/modules

I am still stuck on how to create a proper xorg.conf file. (I see that some people leave it blank)
Do I need to run $sudo nvidia-xconfig? (Answer seems like no, from here https://elementaryforums.com/index.php?threads/howto-install-latest-nvidia-driver-on-linux-without-getting-black-screen.7/

I’m also still not sure why I’m supposed to blacklist the nouveau driver if I don’t want to use the Titan X for rendering. If I do blacklist it, when am I suppose to re-enable it to get my GUI working? I’m stumped here.

That isn’t what I was suggesting at all.

Clean install of the operating system, i.e. reload the operating system.

Then use the runfile installer.
Obviously the uninstall .pl script won’t be there before you install it (the driver runfile installer) the first time.

There are quite a few questions here. Perhaps you should try installing a few times. I’m just offering suggestions. You’re welcome to do whatever you wish.

The nouveau driver only pertains to the NVIDIA GPU. It is not required to “get your GUI working” if the GUI is hosted by another (non-NVIDIA) GPU.

No worries, this has been very helpful! Since the computer is new a fresh OS install is not a problem, but I’d prefer to not over do it.

Ok, perfect. Thanks for clarifying things! I will give it another try and keep you posted.

Update: I get stuck in step 5. After installing the NVIDIA Driver, I do
$sudo service lightdm start
and it takes me back to the login menu with GUI activated, but when I try to log in, I encounter the login-loop again. After reboot, the same error persists.

How exactly did you install the nvidia driver?

  1. I start off with the regular GUI and Ubuntu working with no login problems.

  2. I create an empty xorg.conf file

  3. I create the /etc/modprobe.d/blacklist-nouveau.conf file with :
    blacklist nouveau
    option nouveau modeset=0

    Then $sudo update-initramfs -u

  4. I reboot the system

  5. Before logging in, I do Ctrl + Alt + F1 and I login to my username.

  6. I go to the directory where I have the NVIDIA and CUDA driver, both are run files and I do
    $chmod a+x .

  7. I run $sudo service lightdm stop

  8. I run the NVIDIA driver file:
    $sudo bash NVIDIA-Linux-86_64-352.30.run

  9. I Accept the license and agree on what the driver says. I say “no” to running nvidia-xconfig, which might overwrite the xorg file or x server settings.

  10. Installation is complete.

  11. I do $sudo service lightdm start

  12. It takes me back to login screen. I reboot computer.

  13. Login-loop problem appears and I can’t login again. My only best guess now is that the NVIDIA driver actually did install the TITAN X, but since I have my monitor connected to my integrated intel graphics card, it won’t let me log in because it wants me to connect my monitor to the Titan X. But I want to use the graphics card for CUDA and parallel processing jobs, not for rendering.

Maybe I missed something?

Update: I will re-install adding the --driver and --no-opengl-libs flags (although my .run file never prompted be about this…)

Yes, you missed the part where I said:

“During the install of the runfile, select the command line option to not install OpenGL libs. If you’re not sure what that is, use --help on the runfile install command line to see the command line options, then re-run with the command line option to disable installation of the OpenGL libs.”

This is a (the) critical step to avoid the “login loop”.

And if you’ve already installed this way, a reinstall may not fix anything. You may need to start over with a clean load of the OS.

True

I reinstalled the OS and re-ran everything same as before adding the --no-opengl-libs flag. This is the critical part. I would like to add something else though! I did not install have to install the NVIDIA driver explicitly first. I ran the cuda.run file instead, and said yes to install the NVIDIA driver when it prompted me to. I feel like doing this in one straight run made everything way easier. Thanks for everything!

Post of final script and solution

  1. Download your relevant CUDA.run file: mine was: cuda_7.0.28_linux.run
    Note, that once again this install is if you purely want to use your graphics card (Titan X) for GPU/CUDA purposes and not for rendering.

Also run: $sudo apt-get install build-essential

  1. I start off with the regular GUI and Ubuntu working with no login problems.

  2. No need to create an xorg.conf file. If you have one, remove it (assuming you ahve a fresh OS install). $ sudo rm /etc/X11/xorg.conf

  3. Create the /etc/modprobe.d/blacklist-nouveau.conf file with :
    blacklist nouveau
    option nouveau modeset=0

Then $sudo update-initramfs -u

  1. Reboot computer. Nothing should have changed in loading up menu. You should be taken to the login screen. Once there type: Ctrl + Alt + F1, and login to your user.

  2. Go to the directory where you have the CUDA driver, and run
    $chmod a+x .

  3. Now, run $ sudo service lightdm stop
    The top line is a necessary step for installing the driver.

  4. I run the CUDA driver run file. *Notice that I explicitly don’t want the OpenGL flags to be installed:
    $ sudo bash cuda-7.0.28_linux.run --no-opengl-libs

  5. During the install:
    Accept EULA conditions
    Say YES to installing the NVIDIA driver
    SAY YES to installing CUDA Toolkit + Driver
    Say YES to installing CUDA Samples

Say NO rebuilding any Xserver configurations with Nvidia.

  1. Installation should be complete. Now check if device nodes are present:
    Check if /dev/nvidia* files exist. If they don’t, do :
    $ sudo modprobe nvidia

  2. Set Environment path variables:
    $ export PATH=/usr/local/cuda-7.0/bin:PATH export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:$LD_LIBRARY_PATH

*Change depending on your cuda version.

  1. Verify the driver version:
    $ cat /proc/driver/nvidia/version

  2. Check CUDA driver version:
    $ nvcc -V

[Optional] At this point you can switch the lightdm back on again by doing:
$ sudo service lightdm start.

You should be able to login to your session through the GUI without any problems or login-loops.

  1. Create CUDA Samples. Go to your NVIDIA_CUDA-7.5_Samples folder and type $make.

  2. Go to NVIDIA_CUDA-7.5_Samples/bin/x86_64/linux/release/ for the demos, and do the two standard checks:
    ./deviceQuery
    to see your graphics card specs and
    ./bandwidthTest
    to check if its operating correctly.

Both tests should ultimately output a ‘PASS’ in your terminal.

  1. Reboot. Everything should be ok.

old post but THANK YOU for posting these complete instructions.

Thanks for this great solution, it is lifesaving!

Here is a little correction for 3. part. (option–>options)
options nouveau modeset=0

Thank you very much. Was struggling for 2 days till I found this thread.

Hey!

First of all, thank you for the great post! I have used your post, and am facing a few problems.

I had gone through the entire installation a couple of weeks back, and everything seemed to be working fine. It gave a PASS for both the tests too. But when I’m running the tests now, it says FAIL, and I dont have any clue why. Could you please help me out with this? Could it be because I regularly install the Ubuntu updates?

Background : I also checked cat /proc/driver/nvidia/version and it says cat: /proc/driver/nvidia/version: No such file or directory. For sudo modprobe nvidia in /dev, modprobe: FATAL: Module nvidia not found. It also says nvcc isnt installed? I am new to this, so I dont know where to look for errors!

Thanks!

Yes, it could be because you regularly install Ubuntu updates.

Okay! I have 2 questions :

a) Should I not install the Ubuntu updates?
b) How can I get it working back again? TIA!

There are probably many possible methods to get it working again.

One method would be to clean out all old installations and reinstall the GPU driver and CUDA.

Another method would be to start over with a fresh install of Ubuntu 14.04 and install the GPU driver and CUDA.

Both methods are covered in the CUDA installation guide.

If you don’t know what is going on in the background of linux, don’t update anything. You will have to deal with driver issues, access rigths…

I am using ubuntu for some software tools needs. And I started to use docker images as possible as. If you have this option too, just do it. You will not have to deal with driver issues or python issues or any other…

In docker images, all the things are installed as a package. You just use it, never installation - setup issues…

For example if you need Digits pick up digits docker, if you need mysql pick up mysql docker. Check this site to find your needs whether there is in docker hub:

https://hub.docker.com/explore/