Installation failures(?) despite instructions

whitman.joseph.r · January 17, 2018, 3:50am

Hello all,

Brand new poster here, so I’m sure that this will be a little short on the information needed to fully diagnose and fix the problem.

I’ve tried both the runfile and the .deb installation methods outlined in the cuda instruction manual, and nothing seems to work. I’ve been attempting to install CUDA 8.0 onto ubuntu 16.04 with a GTX980 graphics card.

In the best case, everything seems like it installed, I run deviceQuery and…syntax error, “)” not expected (as I recall, but the system is down right now). Other issues I’ve encountered include a login-screen loop where I can’t get past the login screen (something about not fully purging nvidia drivers caused that, I think), an issue where the installer couldn’t find the kernel headers (yes, I’d updated them, yes, I checked. The solution was to downgrade Ubuntu using sudo apt-get purge && sudo apt-get purge – both of which seem dangerous things to do.)

Steps taken include (skipping the checking if ubuntu and the gfx card are supported):

clean install of ubuntu + sudo apt-get update && sudo apt-get upgrade
download the files
check the md5sum
reboot into mode 3
follow the instructions to purge nvidia-, nvidia-cuda, and blacklist nouveau
sudo sh <runfile_name>
yes to drivers, openGL libraries (doesn’t make a difference-- same results either way), toolkit, samples, etc.
default paths
wait for install to finish
9a) keep waiting
run the export statements
make sure the install was successful with nvcc -V and nvidia-smi
make and attempt to run the samples

I’m about at my wits’ end, here. By everything I’ve done, it seems like this /should/ be working. Can anyone (NVIDIA devs, looking at you in particular) help, please? Feel free to be (extremely) detailed-- I’m inexperienced enough it’s not possible to be too much so.

Robert_Crovella · January 17, 2018, 5:39am

Do you have more than 1 GPU in the system?

What is the output of

lspci | grep -i VGA

whitman.joseph.r · January 17, 2018, 10:59pm

No, I only have the GTX980 as the GPU…unless maybe the integrated graphics on the motherboard counts. That’s not Nvidia, though.

lspci | grep -i VGA gives:

01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 980] (rev a1)

What next?

Robert_Crovella · January 18, 2018, 3:05pm

Yes, the integrated graphics on the motherboard counts, and it affects the installation process if it is being used. However I don’t see it showing up in your case in lspci output. Do you have integrated motherboard graphcis? Are you using it?

bobsaccamano · January 21, 2018, 4:32am

Hi, I have a similar setup with Ubuntu 16.04 and GeForce GTX980M and unable to get tensorflow running.

lspci | grep -i VGA gives:

00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)

I mistakenly installed CUDA 9.1 in addition to CUDA 8.0, however CUDA 8.0 is linked (checked via nvcc -V)

CuDNN v6 is installed.

When I try to import tensorflow, i get the error below:

…
ImportError: libnvidia-fatbinaryloader.so.390.12: cannot open shared object file: No such file or directory

So, now I’ve narrowed it down to having two NVIDIA driver versions 384 and 390, of which the latter is active (and wrong). How can I switch to using nvidia-384? Modifying the LD_LIBRARY_PATH doesn’t help here.

Expert help is much appreciated!

T

Robert_Crovella · January 21, 2018, 4:44am

That’s a problem. Your NVIDIA GPU is not showing up in lspci.

what is the result of:

lspci |grep -i nvidia

?

bobsaccamano · January 21, 2018, 11:23am

Hi txbob,

output of lspci |grep -i nvidia is:

01:00.0 3D controller: NVIDIA Corporation GM204M [GeForce GTX 980M] (rev a1)

~S

Robert_Crovella · January 21, 2018, 12:34pm

It looks like your 390.12 driver install is broken. Depending on your Ubuntu kernel version, switching back to the 384 driver may not be a good idea. Also I would need to know exactly how you installed the 384 and 390 drivers.

Perhaps the best thing to do is just follow the instructions in the CUDA linux install guide for remvoving things:

[url]Installation Guide Linux :: CUDA Toolkit Documentation

and then start over with your driver and CUDA install. Get your CUDA installers here: [url]http://www.nvidia.com/getcuda[/url]

and read the whole install guide before starting.

bobsaccamano · January 21, 2018, 2:21pm

Ubuntu kernel version is 4.13.0-26.

I followed the installation procedure here → http://nathan.vertile.com/blog/2017/08/31/installing-tensorflow-on-ubuntu-16-04/#install-cuda-and-drivers

When installing CUDA, I did not install the driver (384), but installed it separately later using:

sudo apt install nvidia-384

I had to disable secure BOOT in the UEFI mode after rebooting.

Then I installed tensorflow using the official instructions → Install TensorFlow with pip

That’s when I goI t the “ImportError: libcudnn.so.6: cannot open shared object file: No such file or directory” error.

So I proceeded to install CuDNNv6 from the NVIDIA website. After which I accidentally installed CUDA 9.1 and the nvidia-390 driver automatically using apt get.

bobsaccamano · January 21, 2018, 2:27pm

Also, running sudo /usr/bin/nvidia-uninstall gives me a command not found error.

Robert_Crovella · January 21, 2018, 9:03pm

with kernel version 4.13 you need to use the 390 driver.

and /usr/bin/nvidia-uninstall is only available with the runfile installer. You are using the package manager install. I suggest you read the linux install guide again.

whitman.joseph.r · January 22, 2018, 6:27pm

All, original poster here. Sorry for the delay. My motherboard does have an onboard graphics card, but it does not support VGA output, only DVI-D and HDMI. What next?

Robert_Crovella · January 22, 2018, 7:10pm

Start over with a clean load of the operating system, or else remove all previous installations using the methods described here:

[url]Installation Guide Linux :: CUDA Toolkit Documentation

Then get your installers from here: [url]http://www.nvidia.com/getcuda[/url]
Read the linux install guide carefully: [url]Installation Guide Linux :: CUDA Toolkit Documentation

those are the locations for the most recent cuda version. If you want to install an older version (e.g. CUDA 8) then google “cuda toolkit archive”. You will find the installers and also the link to the install guide from the download page for the specific installer.

If you intend to install using a runfile installer on a system with integrated (motherboard) graphics, and you want to use the motherboard graphics for display, then be careful to run the runfile installer with the command line switch called out in the install guide. Otherwise you will run into the login-loop.

whitman.joseph.r · January 26, 2018, 3:10am

Documenting my progress as I go. I’m typing this from a clean system. Results of uname -m && cat /etc/*release are:

$ uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.2 LTS"
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.2 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

results for the linux headers are:

sudo apt-get install linux-headers-$(uname -r)
Reading package lists... Done
Building dependency tree       
Reading state information... Done
linux-headers-4.13.0-31-generic is already the newest version (4.13.0-31.34~16.04.1).

Using the debfile method, and running the dpkg command:

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
Selecting previously unselected package cuda-repo-ubuntu1604-8-0-local-ga2.
(Reading database ... 209154 files and directories currently installed.)
Preparing to unpack cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb ...
Unpacking cuda-repo-ubuntu1604-8-0-local-ga2 (8.0.61-1) ...
Setting up cuda-repo-ubuntu1604-8-0-local-ga2 (8.0.61-1) ...
OK

Ran sudo apt-get update

Ran sudo apt-get install cuda, meant to output to a logfile but instead got several thousand lines on stdout

Ran the specified export PATH command, and had it echo to make sure it was behaving. Seems to, but I know I need to do something to make the change permanent. Did the same for LD_LIBRARY_PATH

nvcc -V at this time does not think it’s installed. This is true despite:

/usr/local/cuda-8.0/samples$ ls
0_Simple     2_Graphics  4_Finance      6_Advanced       common    Makefile
1_Utilities  3_Imaging   5_Simulations  7_CUDALibraries  EULA.txt

What gives? What should I do now?

whitman.joseph.r · January 26, 2018, 3:11am

Update: nvcc not found despite the following packages being installed:

apt list --installed | grep cuda

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

cuda/unknown,now 8.0.61-1 amd64 [installed]
cuda-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-command-line-tools-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-core-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cublas-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cublas-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cudart-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cudart-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cufft-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cufft-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-curand-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-curand-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cusolver-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cusolver-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cusparse-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-cusparse-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-demo-suite-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-documentation-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-driver-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-drivers/unknown,now 375.26-1 amd64 [installed,automatic]
cuda-license-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-misc-headers-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-npp-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-npp-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-nvgraph-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-nvgraph-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-nvml-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-nvrtc-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-nvrtc-dev-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-repo-ubuntu1604-8-0-local-ga2/now 8.0.61-1 amd64 [installed,local]
cuda-runtime-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-samples-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-toolkit-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
cuda-visual-tools-8-0/unknown,now 8.0.61-1 amd64 [installed,automatic]
libcuda1-375/xenial-updates,xenial-security,now 384.111-0ubuntu0.16.04.1 amd64 [installed,automatic]
libcuda1-384/xenial-updates,xenial-security,now 384.111-0ubuntu0.16.04.1 amd64 [installed,automatic]

whitman.joseph.r · January 26, 2018, 3:14am

Second update, and first apology for the giant wall of text (I was hoping that the code tag would allow them to collapse):

Thought I’d check, and nvcc is there. Looks like $PATH didn’t update like I thought it had, must have typo’d.

How can i make permanent the PATH and LD_LIBRARY_PATH changes?

whitman.joseph.r · January 26, 2018, 3:32am

Attempting to make the samples to test everything, I got this:

make[1]: Entering directory '/home/joe/NVIDIA_CUDA-8.0_Samples/NVIDIA_CUDA-8.0_Samples/3_Imaging/cudaDecodeGL'
/usr/local/cuda-8.0/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_20,code=compute_20 -o cudaDecodeGL FrameQueue.o ImageGL.o VideoDecoder.o VideoParser.o VideoSource.o cudaModuleMgr.o cudaProcessFrame.o videoDecodeGL.o  -L/usr/lib/"nvidia-367" -lGL -lGLU -lX11 -lglut -lcuda -lcudart -lnvcuvid
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/bin/ld: cannot find -lnvcuvid
collect2: error: ld returned 1 exit status
Makefile:381: recipe for target 'cudaDecodeGL' failed
make[1]: *** [cudaDecodeGL] Error 1
make[1]: Leaving directory '/home/joe/NVIDIA_CUDA-8.0_Samples/NVIDIA_CUDA-8.0_Samples/3_Imaging/cudaDecodeGL'
Makefile:52: recipe for target '3_Imaging/cudaDecodeGL/Makefile.ph_build' failed
make: *** [3_Imaging/cudaDecodeGL/Makefile.ph_build] Error 2

I don’t understand. Can anyone shed some light on this?

Topic		Replies	Views
CUDA 10 installation problems on Ubuntu 18.04 CUDA Setup and Installation	24	94588	December 11, 2020
Install CUDA-9 on Ubuntu 16.04 with the runfile and pre-installed drivers CUDA Setup and Installation	15	58598	February 28, 2020
Problems with CUDA 9.1 in Ubuntu 16.04 CUDA Setup and Installation	36	24296	May 15, 2018
NVidia driver not loading after CUDA 9.1 installation with runfile CUDA Setup and Installation	15	21492	March 11, 2018
CUDA 4.2 Install in Ubuntu 12.04 CUDA Programming and Performance	12	19873	August 25, 2017
Cuda 8.0 toolkit install - nvcc not found - ubuntu 16.04 CUDA Setup and Installation	21	141486	October 31, 2018
Unable to properly install/uninstall Cuda on Ubuntu 18.04 CUDA Setup and Installation	30	128548	August 10, 2020
Followed guide NVIDIA CUDA Installation Guide for Linux, failing at driver install CUDA Setup and Installation cuda , ubuntu	1	1528	October 27, 2020
Cuda support for legacy GPUs CUDA Setup and Installation	14	8348	November 29, 2016
Nvidia-settings: /usr/lib/libnvidia-gtk3.so.440.33.01: undefined symbol: gtk_widget_hide_on_delete CUDA Setup and Installation	29	12718	October 12, 2021

Installation failures(?) despite instructions

Related topics