CUDA and Nvidia GPU Driver Installation Problem

Hi Everyone,I am about to loose my mind !

I installed Ubuntu 16.04 LTS. My computer has an integrated Intel VGA and a GPU which is Nvidia GTX 1070. As you can predict, I installed the latest Nvidia GPU driver (390.48) and CUDA 9.1 respectively. I also did vice versa. However both sequences ended up with problem. (I applied almost 10 methods from the internet, especially NVIDIA CUDA installation guide for Linux) Here are the steps that I applied:

1) I installed Nvidia GPU driver version of 390.48 from nvidia.com. According to https://gist.github.com/wangruohui/df039f0dc434d6486f5d4d098aa52d07?_pjax=%23gist-pjax-container (via run file). That installation was completed without any problem and I was able to use my GPU. Afterwards, I installed CUDA 9.1 regarding Installation Guide Linux :: CUDA Toolkit Documentation. However, after installation of CUDA, I got a few problems which are;

By the way, I applied this installation sequence a few times.So, the problems that I indicated below are the result of trials. (for each trial, I always formatted the disk and reinstalled Ubuntu)

  • System installation failed because of nvidia driver version of 387 although I installed version of 390.
  • After Installation of Cuda 9.1 after installing gpu driver, driver version of 390.48 reverted to 388.11.
  • and so on…

Again, I reinstalled Ubuntu by formatting and deleting everything in it.

2) I installed Cuda 9.1 and GPU driver version of 390.48 respectively (vice versa of above). This time, again, during the installation of CUDA 9.1, I got the error of fail of nvidia driver installation. So, I installed Nvidia driver version of 390.48 by removing pre-installed nvidia files and things. Still I got nothing except fails.

Please give some advices or solution for this problem. I need to install GPU driver 390.48 and CUDA 9.1. By the way, I have almost tried every solution on the internet, including Nvidia Guides. Where am i doing wrong ?

The Nvidia docs are wrong, you have to install cuda first, then the current driver. Here another user put the steps together in the right order:
[url]https://devtalk.nvidia.com/default/topic/1031213/linux/problem-installing-nvidia-390-42-driver-on-ubuntu-16-04/post/5247202/#5247202[/url]

I have just started to install GPU driver as the moderator just said: Nvidia Driver and CUDA Installation Sequence ! - CUDA Setup and Installation - NVIDIA Developer Forums

If it again goes wrong, I will do what you said. Still thank you, the solution that you recommended seems alright.

That should work alright, too.

GPU driver is installed now.

  • ~$ nvidia-smi
    

    gave the result below. Do you think is that right? Because Processes part seems different to me.

    Fri Apr 20 00:35:54 2018
    ±----------------------------------------------------------------------------+
    | NVIDIA-SMI 390.48 Driver Version: 390.48 |
    |-------------------------------±---------------------±---------------------+
    | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
    |===============================+======================+======================|
    | 0 GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A |
    | N/A 44C P0 35W / N/A | 190MiB / 8117MiB | 0% Default |
    ±------------------------------±---------------------±---------------------+

    ±----------------------------------------------------------------------------+
    | Processes: GPU Memory |
    | GPU PID Type Process name Usage |
    |=============================================================================|
    | 0 1018 G /usr/lib/xorg/Xorg 113MiB |
    | 0 1862 G compiz 40MiB |
    | 0 2164 G …-token=D8FA5E9F73AAA98BA5296F5B713C6DB3 33MiB |
    ±----------------------------------------------------------------------------+

  • Also, under the additional drivers tab in system settings, there is written "No proprietary drivers in use" although latest proprietary driver was installed. Is that a problem ?
  • GPU driver is installed now.

  • ~$ nvidia-smi
    

    gave the result below. Do you think is that right? Because Processes part seems different to me.

  • Fri Apr 20 00:35:54 2018
    ±----------------------------------------------------------------------------+
    | NVIDIA-SMI 390.48 Driver Version: 390.48 |
    |-------------------------------±---------------------±---------------------+
    | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
    |===============================+======================+======================|
    | 0 GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A |
    | N/A 44C P0 35W / N/A | 190MiB / 8117MiB | 0% Default |
    ±------------------------------±---------------------±---------------------+

    ±----------------------------------------------------------------------------+
    | Processes: GPU Memory |
    | GPU PID Type Process name Usage |
    |=============================================================================|
    | 0 1018 G /usr/lib/xorg/Xorg 113MiB |
    | 0 1862 G compiz 40MiB |
    | 0 2164 G …-token=D8FA5E9F73AAA98BA5296F5B713C6DB3 33MiB |
    ±----------------------------------------------------------------------------+

  • Also, under the additional drivers tab in system settings, there is written "No proprietary drivers in use" although latest proprietary driver was installed. Is that a problem ?
  • Looks normal.
    The message is just displayed since you don’t use the driver from standard repository.
    In case you used the .run installer be sure to have enabled dkms.

    How can I check whether dkms is enabled or not?

    Reinstall the driver and look for the question to use it.

    I am almost done. If you are not bored, would you please explain this step to me:

    Reboot into runlevel 3 by temporarily adding the number “3” and the word “nomodeset” to the end of the system’s kernel boot parameters. under this link: Quick Start Guide 12.3 documentation

    runlevel 3 means don’t start the Xserver, nomodeset blocks the nouveau module from loading. This is so the nvidia module can be loaded after build.
    Has to be added to the kernel commandline in grub, hold down shift on boot to show the grub menu the press ‘e’ to edit. Alternatively stop the Xserver and unload any nouveau or nvidia module.

    Problem was solved but still appreciate for your support and help. Also thanks for the clarification, now i get what those mean.