Installation failures(?) despite instructions

Hello all,

Brand new poster here, so I’m sure that this will be a little short on the information needed to fully diagnose and fix the problem.

I’ve tried both the runfile and the .deb installation methods outlined in the cuda instruction manual, and nothing seems to work. I’ve been attempting to install CUDA 8.0 onto ubuntu 16.04 with a GTX980 graphics card.

In the best case, everything seems like it installed, I run deviceQuery and…syntax error, “)” not expected (as I recall, but the system is down right now). Other issues I’ve encountered include a login-screen loop where I can’t get past the login screen (something about not fully purging nvidia drivers caused that, I think), an issue where the installer couldn’t find the kernel headers (yes, I’d updated them, yes, I checked. The solution was to downgrade Ubuntu using sudo apt-get purge && sudo apt-get purge – both of which seem dangerous things to do.)

Steps taken include (skipping the checking if ubuntu and the gfx card are supported):

  1. clean install of ubuntu + sudo apt-get update && sudo apt-get upgrade
  2. download the files
  3. check the md5sum
  4. reboot into mode 3
  5. follow the instructions to purge nvidia-, nvidia-cuda, and blacklist nouveau
  6. sudo sh <runfile_name>
  7. yes to drivers, openGL libraries (doesn’t make a difference-- same results either way), toolkit, samples, etc.
  8. default paths
  9. wait for install to finish
    9a) keep waiting
  10. run the export statements
  11. make sure the install was successful with nvcc -V and nvidia-smi
  12. make and attempt to run the samples

I’m about at my wits’ end, here. By everything I’ve done, it seems like this /should/ be working. Can anyone (NVIDIA devs, looking at you in particular) help, please? Feel free to be (extremely) detailed-- I’m inexperienced enough it’s not possible to be too much so.

Do you have more than 1 GPU in the system?

What is the output of

lspci | grep -i VGA

No, I only have the GTX980 as the GPU…unless maybe the integrated graphics on the motherboard counts. That’s not Nvidia, though.

lspci | grep -i VGA gives:

01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 980] (rev a1)

What next?

Yes, the integrated graphics on the motherboard counts, and it affects the installation process if it is being used. However I don’t see it showing up in your case in lspci output. Do you have integrated motherboard graphcis? Are you using it?

Hi, I have a similar setup with Ubuntu 16.04 and GeForce GTX980M and unable to get tensorflow running.

lspci | grep -i VGA gives:

00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)

I mistakenly installed CUDA 9.1 in addition to CUDA 8.0, however CUDA 8.0 is linked (checked via nvcc -V)

CuDNN v6 is installed.

When I try to import tensorflow, i get the error below:


ImportError: libnvidia-fatbinaryloader.so.390.12: cannot open shared object file: No such file or directory

So, now I’ve narrowed it down to having two NVIDIA driver versions 384 and 390, of which the latter is active (and wrong). How can I switch to using nvidia-384? Modifying the LD_LIBRARY_PATH doesn’t help here.

Expert help is much appreciated!

T

That’s a problem. Your NVIDIA GPU is not showing up in lspci.

what is the result of:

lspci |grep -i nvidia

?

Hi txbob,

output of lspci |grep -i nvidia is:

01:00.0 3D controller: NVIDIA Corporation GM204M [GeForce GTX 980M] (rev a1)

~S

It looks like your 390.12 driver install is broken. Depending on your Ubuntu kernel version, switching back to the 384 driver may not be a good idea. Also I would need to know exactly how you installed the 384 and 390 drivers.

Perhaps the best thing to do is just follow the instructions in the CUDA linux install guide for remvoving things:

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#handle-uninstallation

and then start over with your driver and CUDA install. Get your CUDA installers here: http://www.nvidia.com/getcuda

and read the whole install guide before starting.

Ubuntu kernel version is 4.13.0-26.

I followed the installation procedure here --> http://nathan.vertile.com/blog/2017/08/31/installing-tensorflow-on-ubuntu-16-04/#install-cuda-and-drivers

When installing CUDA, I did not install the driver (384), but installed it separately later using:

sudo apt install nvidia-384

I had to disable secure BOOT in the UEFI mode after rebooting.

Then I installed tensorflow using the official instructions --> https://www.tensorflow.org/install/install_linux

That’s when I goI t the “ImportError: libcudnn.so.6: cannot open shared object file: No such file or directory” error.

So I proceeded to install CuDNNv6 from the NVIDIA website. After which I accidentally installed CUDA 9.1 and the nvidia-390 driver automatically using apt get.

Also, running sudo /usr/bin/nvidia-uninstall gives me a command not found error.

with kernel version 4.13 you need to use the 390 driver.

and /usr/bin/nvidia-uninstall is only available with the runfile installer. You are using the package manager install. I suggest you read the linux install guide again.

All, original poster here. Sorry for the delay. My motherboard does have an onboard graphics card, but it does not support VGA output, only DVI-D and HDMI. What next?

Start over with a clean load of the operating system, or else remove all previous installations using the methods described here:

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#handle-uninstallation

Then get your installers from here: http://www.nvidia.com/getcuda
Read the linux install guide carefully: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

those are the locations for the most recent cuda version. If you want to install an older version (e.g. CUDA 8) then google “cuda toolkit archive”. You will find the installers and also the link to the install guide from the download page for the specific installer.

If you intend to install using a runfile installer on a system with integrated (motherboard) graphics, and you want to use the motherboard graphics for display, then be careful to run the runfile installer with the command line switch called out in the install guide. Otherwise you will run into the login-loop.

Documenting my progress as I go. I’m typing this from a clean system. Results of uname -m && cat /etc/*release are:

$ uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.2 LTS"
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.2 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

results for the linux headers are:

sudo apt-get install linux-headers-$(uname -r)
Reading package lists... Done
Building dependency tree       
Reading state information... Done
linux-headers-4.13.0-31-generic is already the newest version (4.13.0-31.34~16.04.1).

Using the debfile method, and running the dpkg command:

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
Selecting previously unselected package cuda-repo-ubuntu1604-8-0-local-ga2.
(Reading database ... 209154 files and directories currently installed.)
Preparing to unpack cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb ...
Unpacking cuda-repo-ubuntu1604-8-0-local-ga2 (8.0.61-1) ...
Setting up cuda-repo-ubuntu1604-8-0-local-ga2 (8.0.61-1) ...
OK

Ran sudo apt-get update

Ran sudo apt-get install cuda, meant to output to a logfile but instead got several thousand lines on stdout

Ran the specified export PATH command, and had it echo to make sure it was behaving. Seems to, but I know I need to do something to make the change permanent. Did the same for LD_LIBRARY_PATH

nvcc -V at this time does not think it’s installed. This is true despite:

/usr/local/cuda-8.0/samples$ ls
0_Simple     2_Graphics  4_Finance      6_Advanced       common    Makefile
1_Utilities  3_Imaging   5_Simulations  7_CUDALibraries  EULA.txt

What gives? What should I do now?

Update: nvcc not found despite the following packages being installed:

apt list --installed | grep cuda

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

cuda/unknown,now 8.0.61-1 amd64 [installed]
cuda-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-command-line-tools-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-core-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cublas-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cublas-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cudart-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cudart-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cufft-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cufft-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-curand-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-curand-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cusolver-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cusolver-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cusparse-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cusparse-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-demo-suite-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-documentation-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-driver-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-drivers/unknown,now 375.26-1 amd64 [installed,automatic]
cuda-license-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-misc-headers-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-npp-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-npp-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-nvgraph-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-nvgraph-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-nvml-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-nvrtc-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-nvrtc-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-repo-ubuntu1604-8-0-local-ga2/now 8.0.61-1 amd64 [installed,local]
cuda-runtime-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-samples-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-toolkit-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-visual-tools-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
libcuda1-375/xenial-updates,xenial-security,now 384.111-0ubuntu0.16.04.1 amd64 [installed,automatic]
libcuda1-384/xenial-updates,xenial-security,now 384.111-0ubuntu0.16.04.1 amd64 [installed,automatic]

Second update, and first apology for the giant wall of text (I was hoping that the code tag would allow them to collapse):

Thought I’d check, and nvcc is there. Looks like $PATH didn’t update like I thought it had, must have typo’d.

How can i make permanent the PATH and LD_LIBRARY_PATH changes?

Attempting to make the samples to test everything, I got this:

make[1]: Entering directory '/home/joe/NVIDIA_CUDA-8.0_Samples/NVIDIA_CUDA-8.0_Samples/3_Imaging/cudaDecodeGL'
/usr/local/cuda-8.0/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_20,code=compute_20 -o cudaDecodeGL FrameQueue.o ImageGL.o VideoDecoder.o VideoParser.o VideoSource.o cudaModuleMgr.o cudaProcessFrame.o videoDecodeGL.o  -L/usr/lib/"nvidia-367" -lGL -lGLU -lX11 -lglut -lcuda -lcudart -lnvcuvid
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/bin/ld: cannot find -lnvcuvid
collect2: error: ld returned 1 exit status
Makefile:381: recipe for target 'cudaDecodeGL' failed
make[1]: *** [cudaDecodeGL] Error 1
make[1]: Leaving directory '/home/joe/NVIDIA_CUDA-8.0_Samples/NVIDIA_CUDA-8.0_Samples/3_Imaging/cudaDecodeGL'
Makefile:52: recipe for target '3_Imaging/cudaDecodeGL/Makefile.ph_build' failed
make: *** [3_Imaging/cudaDecodeGL/Makefile.ph_build] Error 2

I don’t understand. Can anyone shed some light on this?