CUDA 10 installation problems on Ubuntu 18.04

Installed NVidia drivers for Quadro M500M as follows:

sudo add-apt-repository ppa:graphics-drivers/ppa
  sudo apt-get update
  sudo apt install nvidia-390
  restart

Checked the installation by

nvidia-smi

and got:

Mon Dec 17 10:13:49 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87                 Driver Version: 390.87                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro M500M        Off  | 00000000:06:00.0 Off |                  N/A |
| N/A   46C    P0    N/A /  N/A |    688MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1305      G   /usr/lib/xorg/Xorg                           359MiB |
|    0      1496      G   /usr/bin/gnome-shell                         166MiB |
|    0      1984      G   ...uest-channel-token=18290818570900219022   160MiB |
+-----------------------------------------------------------------------------+

So everything seams fine with drivers.
Just in case restarted the system.

Followed instructions for CUDA installation from https://www.tensorflow.org/install/gpu.
Selected:

Linux > x86_64 > Ubuntu > 18.04 > deb (local)

and followed the instructions listed underneath:

sudo dpkg -i cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda-libraries-10-0

Instead of the 4th line above I have also tried

sudo apt-get install cuda

Running

nvcc --version

did not give me info about the version of CUDA:

Command 'nvcc' not found, but can be installed with:

sudo apt install nvidia-cuda-toolkit

So I tried

sudo apt install nvidia-cuda-toolkit
nvcc --version

only to get

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

[b]

  1. So, why is CUDA-10.0 not detected? I thought I was following instructions to the letter
  2. Why is there a different method of installing CUDA that is not listed on the CUDA web page and why does it install an older version of CUDA?
  3. What is a good WORKING method for installing CUDA 10.0?

[/b]

follow the instructions in the linux install guide: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

get your installers from http://www.nvidia.com/getcuda

and you won’t be able to use that ppa driver nvidia-390 with CUDA 10. Use the driver bundled with the CUDA 10 installers instead.

Now that you’ve already installed the wrong drivers, read the linux install guide carefully. Failure to follow it carefully will result in more trouble.

1 Like

Ok, assuming I did something wrong before. Here are my steps, skipping verification steps:

My ‘/usr/local/’ contains ‘cuda’ and ‘cuda-10.0’.
Uninstalling a Toolkit runfile installation:

sudo /usr/local/cuda-10.0/bin/uninstall_cuda_10.0.pl

results in

command not found

There is no ‘uninstall_cuda*’ in ‘/usr/local/cuda-10.0/bin/’.
Also there is no ‘uninstall_cuda*’ in ‘/usr/local/cuda/bin/’

Continuing with the next line:

sudo /usr/bin/nvidia-uninstall

Same result. No file.

Continuing with the next line:

sudo apt-get --purge remove cuda

I got

... Removing cuda (10.0.130-1) ...

Looking in ‘/usr/local/’ I still see non-empty ‘cuda’ and ‘cuda-10.0’.

Doing

nvcc --version

I get

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85


So, did I successfully uninstall a Toolkit runfile installation?

This does not make sense to me. Are ‘Cuda compilation tools’ the same as ‘Cuda toolkit’?
Based on the messages the system detected version 10.0 when it was time to uninstall it, but it does not see version 9.1 as it is still there. What’s going on? What am I missing?

You didn’t perform a runfile installation to begin with. Please re-read the install guide. This:

sudo apt-get install cuda

is not a runfile installation.

You’ll need to get the nvidia-390 driver package off your system. Since those instructions didn’t come from NVIDIA, and the driver was not bundled by NVIDIA, but instead by a 3rd-party, you may need to check elsewhere for removal instructions. But something like:

sudo apt-get --purge remove nvidia-390

should work, I think.

Then reinstall cuda:

sudo dpkg -i cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pu
sudo apt-get update
sudo apt-get install cuda

That should install the 410.48 driver for you. Verify that it was successful after a reboot with

nvidia-smi

The reported driver version should be 410.48

If it is not, your system was not properly cleaned up. I won’t be able to guide you through other clean up steps. A generally working option is to re-install the OS.

If the driver is installed correctly, cuda should be installed as well.

Now follow steps in section 7 of the linux install guide to:

  • perform the mandatory post-install steps handling PATH and LD_LIBRARY_PATH
  • verify the cuda install by building and running a few sample codes, such as deviceQuery and vectorAdd

I did

sudo apt-get purge nvidia*

Now

nvcc --version

returns

Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit

Then followed your directions:

sudo dpkg -i cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda

Restarted. Checking

nvidia-smi

I get

Mon Dec 17 13:44:11 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.78       Driver Version: 410.78       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro M500M        Off  | 00000000:06:00.0 Off |                  N/A |
| N/A   48C    P0    N/A /  N/A |    386MiB /  2004MiB |      6%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1270      G   /usr/lib/xorg/Xorg                           230MiB |
|    0      1492      G   /usr/bin/gnome-shell                         112MiB |
|    0      1883      G   ...uest-channel-token=15365897714315109728    41MiB |
+-----------------------------------------------------------------------------+

So, all seams good for now.

Following steps starting from 7:

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}

Verifying driver versions

cat /proc/driver/nvidia/version

I am getting

NVRM version: NVIDIA UNIX x86_64 Kernel Module  410.78  Sat Nov 10 22:09:04 CST 2018
GCC version:  gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)
nvcc --version

I am getting

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

So the problem I had before was solved already.

Now trying to do “7.2.3.2. Compiling the Examples” where I ran into issues.
What is this path

~/NVIDIA_CUDA-10.0_Samples?

It does not exist. Instead I see

/usr/local/cuda-10.0/samples

There is a “Makefile” there. But when I do

make

I get an error:

make[1]: Entering directory '/usr/local/cuda-10.0/samples/0_Simple/fp16ScalarProduct'
/usr/local/cuda-10.0/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o fp16ScalarProduct.o -c fp16ScalarProduct.cu
Assembler messages:
Fatal error: can't create fp16ScalarProduct.o: Permission denied
Makefile:288: recipe for target 'fp16ScalarProduct.o' failed
make[1]: *** [fp16ScalarProduct.o] Error 1
make[1]: Leaving directory '/usr/local/cuda-10.0/samples/0_Simple/fp16ScalarProduct'
Makefile:51: recipe for target '0_Simple/fp16ScalarProduct/Makefile.ph_build' failed
make: *** [0_Simple/fp16ScalarProduct/Makefile.ph_build] Error 2

Am I running the right samples? What does this mean?

you don’t have write access to the directories where those sample codes are located.

do something like:

sudo make -k

This worked:

sudo make -k

It did not finish yet, but it has been doing something for the last five minutes plus. So I assume all is well now. Thank you for your help! Very much appreciate it.

Yes, it takes a while to build all the sample codes.

CUDA 10 should be installed. However, no sign within /user/local/cuda where I find 2 cuda-9 version.
What happened?

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.18       Driver Version: 415.18       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:08:00.0 Off |                  N/A |
| 35%   39C    P2    58W / 260W |    613MiB / 10989MiB |     15%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:41:00.0 Off |                  N/A |
| 35%   34C    P8     5W / 260W |    579MiB / 10986MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      7082      C   ...ce/embeddings/embeddingsenv/bin/python3   325MiB |
|    0     10137      C   ...ce/embeddings/embeddingsenv/bin/python3   277MiB |
|    1      1486      G   /usr/lib/xorg/Xorg                            16MiB |
|    1      7082      C   ...ce/embeddings/embeddingsenv/bin/python3   275MiB |
|    1     10137      C   ...ce/embeddings/embeddingsenv/bin/python3   275MiB |
+-----------------------------------------------------------------------------+

I don’t know what “2 cuda-9” means

Do you mean previously you had cuda 9 installed and you want to know where it is now?

Look in /usr/local, not /usr/local/cuda

Sorry for my poor explanation.
In the local folder I have 2 cuda folders.

:/usr/local$ ls
bin  cuda  cuda-9.0  etc  games  include  lib  man  sbin  share  src

Within each folder, according to version.txt I have the same version of cuda

CUDA Version 9.0.176

My question are:

  • where is cuda-10?
  • why is in the kernel but I can’t find it anywhere?
  • Is there a way of finding it without reinstalling everything?

Thanks upfront for the help!

Did you install CUDA 10?

It looks to me like you simply haven’t installed CUDA 10. You have an updated GPU driver (415.18). However, the fact that nvidia-smi indicates: CUDA Version: 10.0 doesn’t actually mean you have CUDA 10 installed.

I checked the Pre-installation Actions (https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
I don’t understand how to use the 2.7. Handle Conflicting Installation Methods:

  • CUDA Version 9.0.176
  • NVIDIA-SMI 415.18 Driver Version: 415.18 CUDA Version: 10.0
    Is there any conflict?

I performed the 3.6. Ubuntu from 1 to 4.

  • If I hit 5 (sudo apt-get install cuda) will I keep both CUDA 9.0 and 10.1 or 9.0 will disappear?
  • since the driver is already up to date, should I hit “sudo apt-get install cuda-toolkit-10-1” instead?

Thanks again for your help!

I followed your instructions and have found that they installed driver version 10.

NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0

And the minimum driver cuda 10 is 410.42.

Now I do not even know how to remove these and you just wasted a few hours.

May I know why are you ship wrong version of drivers with Cuda?

410.104 is a newer driver than 410.42. So it satisfies the minimum driver requirement for CUDA 10.

104 > 42

I did a fresh install of ubuntu after trying to install Cuda and Cudnn yesterday. I installed the Nvidia 410 driver (ONLY THE DRIVER AND NOTHING ELSE) and rebooted my system.

The problem I have is that if I run “nvidia-smi” in terminal , it shows “Cuda : 10.0” even though I have never installed Cuda. Now should I still follow your guide to install CUDA on linux - because last time I did that ,I ended up with multiple CUDA versions on my system and had to reinstall ubuntu.

Nvidia Driver version : 410.104
nvidia-smi command : Working
OS : Ubuntu 18.04

PS : When I install nvidia-driver-396 , it doesnt install CUDA on it’s own… But all the versions above 410 install CUDA on it’s own (OR atleast thats what it shows in nvidia-smi command)

Any help would be appreciated.

The CUDA version shown in nvidia-smi command (on newer driver versions) does not indicate that CUDA is actually installed, or what version of CUDA is installed. It indicates what is the highest version of CUDA that the driver is compatible with.

Dear NVIDIA,

My nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:01:00.0  On |                  N/A |
| 35%   53C    P2    41W / 180W |   1627MiB /  8118MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1119      G   /usr/lib/xorg/Xorg                           863MiB |
|    0      1259      G   /usr/bin/gnome-shell                         395MiB |
|    0      2642      G   ...equest-channel-token=643813161121753532   256MiB |
|    0      3693      C   /usr/lib/libreoffice/program/soffice.bin     105MiB |
|    0      7966      G   gnome-control-center                           2MiB |
+-----------------------------------------------------------------------------+

my nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

I need to have CUDA 10.0.
Would you mind to add cuda_10.0.130_410.48_linux.run at https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ ?

Is there any work around to have CUDA higher than 9.1?

Thank you very much in advance.

Warmest Regards,
Suryadi

if you use the package manager install method, just do:

sudo apt-get install cuda-toolkit-10-0

if you use the runfile install method, download the runfile installer you already mentioned (cuda_10.0.130_410.48_linux.run), run it, and select no when prompted to install the driver (your 430.xx driver is fine for all this).

If you have no idea what any of this means, please read the linux install guide:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

You can get older installers here:

https://www.nvidia.com/getcuda

Use the legacy release button to access the toolkit installer archive. You’ll note that older documentation versions are available there online as well.