How to verify version match of toolkit and driver

I successfully installed the CUDA driver for a 1080 Ti based Linux system, but then realized that I needed to install the CUDA Toolkit. Tried the latest .run file without much luck (could not get past an initial error screen that complained about the driver already installed).

So I went back through the “Nvidia Installation Guide for Linux” and instructions here at https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=deblocal

It appeared that the .deb file may be a better installation method. I completed the recommended steps for ‘deb local’:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
(etc)…
sudo apt-get -y install cuda

This did run. No evident problems. However, after reboot, nvcc was not found:

Command ‘nvcc’ not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit0

Running the latter command did install nvcc. ‘which nvcc’ shows /usr/bin/nvcc. But the toolkit/cuda version reported by nvcc -V is:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85


nvidia-smi reports the driver as 418.87.00, and CUDA Version 10.1:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+

“cat /usr/local/cuda-10.1/version.txt” shows

CUDA Version 10.1.243

Two major questions:

1: Why is nvcc reporting v9.1.85?

2: Why was the toolkit not installed by the recommended procedure for the local .deb file?

That’s because you didn’t follow step 7 in the linux install guide you already referenced. That would have been the right thing to do. Everything you needed was already there, you just needed to set some environment variables.

That was the wrong thing to do. You have now intermixed an NVIDIA installation method with a Ubuntu installation method. Your system is now mixed up, and the remainder of your questions/confusion are reflective of this. You have two different versions of nvcc installed, in two different places. The Ubuntu install selected CUDA 9.x, for whatever reason.

(If you did follow the instructions in step 7, then you did it incorrectly. You need to set the environment variables in such a way that they will persist through a reboot.)

I’m a windows programmer, so not accustomed to the relatively scattered Linux methodology. I had thought that, given that the NVidia .deb file was initialized, that it was the source of the recommendation for the additional toolkit installation. I do wish there was some cohesive foundation that would prevent the confusing mix of driver and software sources.

Is it necessary to completely uninstall the toolkit and reinstall? Obviously this is an easily encountered problem, so I doubt that this is the first instance. I haven’t seen any recommendations for fixing this type of conflict.

The good news, though, is that I had been hitting a wall when trying to install via the .run file. The .deb file seemed to know that the existing NVidia driver was compatible. That seems to be a better method overall.

It may not be necessary to completely start over. Since you have 418.87.00 still loaded, that is the main thing. Set up your environment variables like is listed in step 7, and put those settings in your .bashrc file to make them persist through reboots.

That may be all you need to do. As long as you get your PATH ordering right, the fact that nvcc 9.1.85 is in /usr/bin may not be an issue.

But there should be a command something like:

sudo apt remove nvidia-cuda-toolkit0

that should unwind that errant step, adn I don’t think it will disrupt anything else.

That seemed to work, Robert. I had probably missed a something in setting the path. Unfortunate that such a simple omission would result in a misleading prompt by Linux, but I guess I’ll have to get used to that.

I did do the ‘remove’ command, even though it may not have been necessary.

The samples built correctly, except for 2_Graphics and 5_Simulations, which evidently require OpenGL libraries that are not found. Compiler error:

">>> WARNING - libGL.so not found, refer to CUDA Getting Started Guide for how to find and install them. <<<

WARNING - libGLU.so not found, refer to CUDA Getting Started Guide for how to find and install them. <<<"

locate libGL.so reports:

/usr/lib/x86_64-linux-gnu/libGL.so
/usr/lib/x86_64-linux-gnu/libGL.so.1
/usr/lib/x86_64-linux-gnu/libGL.so.1.0.0

I haven’t found an up to date Getting Started Guide. The older 6.5 Getting Started Guide (2014) has a couple recommendations, but I don’t want to risk incompatibility again.

The CUDA Quick Start Guide indicates that the nbody sample in 5_Simulations should be directly compilable, so maybe I missed something else.

PS: I found that this works for compiling some (not all) of the OpenGL-based samples:

GLPATH=/usr/lib make

Not exactly sure how that particular command works, since setting GLPATH manually does not help.

PS: Thanks for your advice on this, Robert. Linux seems rather Wild West at times. In retrospect, some of this makes a bit more sense, but it’s difficult to determine the correct path sometimes. Especially when Linux itself is prompting for actions that are not necessarily productive.

I’d still like to find out about GLPATH, but that seems to be a separate subject, so I’ll post a query to another thread.

In linux, when you specify

MYENVIRONMENTVARIABLE=something my_bash_command

It means that that bash command (my_bash_command) will run in a shell that has MYENVIRONMENTVARIABLE set to something (but that variable does not remain set that way for subsequent commands).

So when you do:

GLPATH=/usr/lib make

then the GLPATH environment variable is set to /usr/lib for the duration of that make processing.

Why does this matter? Because the Makefile that is used by that make command is using make-specific syntax to check for that environment variable specifically, and if it is set, use it. Specifically it uses it to find where certain OpenGL libraries are resident in your system.

If you don’t set that variable, the make/Makefile doesn’t know where the needed OGL libraries are, and so compilation of certain openGL sample codes may fail.

Thanks for the clarification, Robert. I knew that GLPATH was needed to locate the libraries. I had set “GLPATH=/usr/lib”, but called make on the next line. “echo $GLPATH” indicated that it was still set afterward. Thought that the environment was carried through to subsequent commands or subshells, but apparently that’s not the whole story.

Re the compiler error messages:
">>> WARNING - libGL.so not found, refer to CUDA Getting Started Guide for how to find and install them. "

The CUDA Getting Started Guide appears to be an early version of what is now called the “CUDA Installation Guide for Linux.” That’s why I was not able to find a current version. Perhaps that error message should be updated to the newer title.