EDIT: I wasn’t able to post the issue as is since I kept getting ‘new users are only allowed to post 3 links per post’ so I replaced all the periods with (dot) and deleted all instances of ‘https’ ,‘com’, and ‘html’ in order to pass through the spam filter. It seems that a lot of the output was being interpreted as links, since much of it doed contain links.
I am trying to install cuda 11(dot)1, both the runtime api and on my gpu(dot)
I am running Ubuntu x86_64 18(dot)04(dot) I have tried upgrading my Cuda runtime to 11(dot)1 but have not been able to do so(dot) The driver has been updated, but not my runtime api(dot)
nvidia-smi
Shows that I have upgraded to 11(dot)0, but
nvcc -V
Shows version 10(dot)0(dot)130 installed for the runtime API(dot)
Following the instructions from
docs(dot)nvidia (dot) /cuda/cuda-installation-guide-linux/index (dot)
I will go through the commands in order listed in the guide(dot)
Section 2(dot) Pre-installation Actions
lspci | grep -i nvidia
resulted in
19:00(dot)0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
19:00(dot)1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
19:00(dot)2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
19:00(dot)3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
1a:00(dot)0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
1a:00(dot)1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
1a:00(dot)2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
1a:00(dot)3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
67:00(dot)0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
67:00(dot)1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
67:00(dot)2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
67:00(dot)3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
68:00(dot)0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
68:00(dot)1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
68:00(dot)2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
68:00(dot)3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
uname -m && cat /etc/*release
resulted in
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18(dot)04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18(dot)04(dot)3 LTS"
NAME="Ubuntu"
VERSION="18(dot)04(dot)3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18(dot)04(dot)3 LTS"
VERSION_ID="18(dot)04"
HOME_URL="://(dot)ubuntu (dot)/"
SUPPORT_URL="://help(dot)ubuntu (dot)/"
BUG_REPORT_URL="://bugs(dot)launchpad (dot)net/ubuntu/"
PRIVACY_POLICY_URL="://(dot)ubuntu (dot)/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
gcc --version
results
gcc (Ubuntu 7(dot)5(dot)0-3ubuntu1~18(dot)04) 7(dot)5(dot)0
Copyright (C) 2017 Free Software Foundation, Inc(dot)
This is free software; see the source for copying conditions(dot) There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE(dot)
uname -r
results in
5(dot)4(dot)0-51-generic
sudo apt-get install linux-headers-$(uname -r)
results in
Reading package lists(dot)(dot)(dot) Done
Building dependency tree
Reading state information(dot)(dot)(dot) Done
linux-headers-5(dot)4(dot)0-51-generic is already the newest version (5(dot)4(dot)0-51(dot)56~18(dot)04(dot)1)(dot)
linux-headers-5(dot)4(dot)0-51-generic set to manually installed(dot)
The following packages were automatically installed and are no longer required:
dkms libaccinj64-10(dot)0 libatomic1:i386 libboost-python1(dot)65(dot)1 libbsd0:i386 libc-ares2 libcublas10(dot)0 libcudnn7 libcufft10(dot)0 libcufftw10(dot)0 libcuinj64-10(dot)0 libcupti-dev libcupti-doc libcupti10(dot)0 libcurand10(dot)0
libcusolver10(dot)0 libcusparse10(dot)0 libdrm-amdgpu1:i386 libdrm-intel1:i386 libdrm-nouveau2:i386 libdrm-radeon1:i386 libdrm2:i386 libedit2:i386 libelf1:i386 libexpat1:i386 libffi6:i386 libgflags2(dot)2 libgl1:i386
libgl1-mesa-dri:i386 libglapi-mesa:i386 libglvnd0:i386 libglx-mesa0:i386 libglx0:i386 libgoogle-glog0v5 libgrpc7 libjs-sphinxdoc libleveldb1v5 libllvm10:i386 liblmdb0 libnppc10(dot)0 libnppial10(dot)0 libnppicc10(dot)0
libnppicom10(dot)0 libnppidei10(dot)0 libnppif10(dot)0 libnppig10(dot)0 libnppim10(dot)0 libnppist10(dot)0 libnppisu10(dot)0 libnppitc10(dot)0 libnpps10(dot)0 libnvblas10(dot)0 libnvgraph10(dot)0 libnvidia-cfg1-450 libnvidia-common-450
libnvidia-compute-450:i386 libnvidia-decode-450 libnvidia-decode-450:i386 libnvidia-encode-450 libnvidia-encode-450:i386 libnvidia-extra-450 libnvidia-extra-450:i386 libnvidia-fbc1-450 libnvidia-fbc1-450:i386
libnvidia-gl-450 libnvidia-gl-450:i386 libnvidia-ifr1-450 libnvidia-ifr1-450:i386 libnvrtc10(dot)0 libnvtoolsext1 libnvvm3 libpciaccess0:i386 libprotobuf18 libprotoc18 libsensors4:i386 libsleef3 libstdc++6:i386
libthrust-dev libvdpau-dev libx11-6:i386 libx11-xcb1:i386 libxau6:i386 libxcb-dri2-0:i386 libxcb-dri3-0:i386 libxcb-glx0:i386 libxcb-present0:i386 libxcb-sync1:i386 libxcb1:i386 libxdamage1:i386 libxdmcp6:i386
libxext6:i386 libxfixes3:i386 libxnvctrl0 libxshmfence1:i386 libxxf86vm1:i386 pkg-config protobuf-compiler python-absl python-astor python-cffi python-configparser python-future python-gast python-grpcio
python-leveldb python-networkx python-pasta python-ply python-protobuf python-pycparser python-pywt python-skimage python-skimage-lib python-termcolor python-typing python-wrapt python3-absl python3-astor
python3-cffi python3-future python3-gast python3-grpcio python3-leveldb python3-markdown python3-networkx python3-pasta python3-ply python3-pycparser python3-pyinotify python3-pywt python3-skimage python3-skimage-lib
python3-tensorflow-serving python3-termcolor python3-werkzeug python3-wrapt screen-resolution-extra xserver-xorg-video-nvidia-450
Use 'sudo apt autoremove' to remove them(dot)
0 upgraded, 0 newly installed, 0 to remove and 179 not upgraded(dot)
Section 2(dot)7(dot) Handle Conflicting Installation Methods
I ran the following commands
sudo /usr/bin/nvidia-uninstall
sudo apt-get --purge remove cuda*
sudo apt-get --purge remove nvidia*
sudo apt-get --purge remove libcuda*
I tried looking for
sudo /usr/local/cuda-X(dot)Y/bin/uninstall_cuda_X(dot)Y(dot)pl
But there wasn’t any file with that name in bin, so I don’t think the previous cuda was installed with runfile(dot)
I checked both nvidia-smi
and nvcc -V
and both times the commands weren’t found, but when(dot) When I was running the installer, I kept getting a warning message there is is a previous installer,
Existing package manager installation of the driver found(dot) It is strongly recommended that you remove this before continuing(dot)
so I tried some other methods to remove the cuda installations
sudo apt-get --purge remove cuda-11(dot)0
sudo apt-get --purge remove cuda-11(dot)1
sudo apt-get --purge remove cuda-10(dot)0
sudo apt-get purge nvidia*
sudo apt-get remove --purge cuda-* libcuda* nvidia*
sudo rm /etc/apt/sources(dot)list(dot)d/cuda*
sudo apt remove --autoremove nvidia-cuda-toolkit
sudo dpkg -l | grep nvidia
sudo apt purge cuda
sudo apt purge -y nvidia
sudo apt remove -y nvidia-*
sudo rm /etc/apt/sources(dot)list(dot)d/cuda*
sudo apt autoremove -y && apt autoclean -y
sudo rm -rf /usr/local/cuda*
Section 6(dot) Runfile Installation
6(dot)3(dot) Disabling Nouveau
I ran the following commands
touch /etc/modprobe(dot)d/blacklist-nouveau(dot)conf
And added
blacklist nouveau
options nouveau modeset=0
To that file(dot) Then I executed
update-initramfs: Generating /boot/initrd(dot)img-5(dot)4(dot)0-52-generic
Which resulted in
update-initramfs: Generating /boot/initrd(dot)img-5(dot)4(dot)0-52-generic
I then tested lsmod | grep nouveau
to see if it prints anything, and it didn’t(dot)
I then tried this installation
://developer(dot)nvidia (dot)/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=runfilelocal
Which gave these commands
wget ://developer(dot)download(dot)nvidia (dot)/compute/cuda/11(dot)1(dot)0/local_installers/cuda_11(dot)1(dot)0_455(dot)23(dot)05_linux(dot)run
sudo sh cuda_11(dot)1(dot)0_455(dot)23(dot)05_linux(dot)run
I downloaded the installer and ran sudo sh cuda_11(dot)1(dot)0_455(dot)23(dot)05_linux(dot)run
Which resulted in this message
Installation failed(dot) See log at /var/log/cuda-installer(dot)log for details(dot)
I opened that file, and this was the contents
[INFO]: Driver not installed(dot)
[INFO]: Checking compiler version(dot)(dot)(dot)
[INFO]: gcc location: /usr/bin/gcc
[INFO]: gcc version: gcc version 7(dot)5(dot)0 (Ubuntu 7(dot)5(dot)0-3ubuntu1~18(dot)04)
[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 455(dot)23(dot)05
[INFO]: Executing NVIDIA-Linux-x86_64-455(dot)23(dot)05(dot)run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed(dot)
[ERROR]: Install of 455(dot)23(dot)05 failed, quitting
So it looks like the installation is failing at the driver(dot) I’m not sure what may have been causing this error since 11(dot)0 had been previously installed onto the GPU(dot)
I then tried to install via deb
://developer(dot)nvidia (dot)/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=deblocal
Which gave these commands
wget ://developer(dot)download(dot)nvidia(dot)/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804(dot)pin
sudo mv cuda-ubuntu1804(dot)pin /etc/apt/preferences(dot)d/cuda-repository-pin-600
wget https://developer (dot)download(dot)nvidia (dot)com/compute/cuda/11(dot)1(dot)0/local_installers/cuda-repo-ubuntu1804-11-1-local_11(dot)1(dot)0-455(dot)23(dot)05-1_amd64(dot)deb
sudo dpkg -i cuda-repo-ubuntu1804-11-1-local_11(dot)1(dot)0-455(dot)23(dot)05-1_amd64(dot)deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-1-local/7fa2af80(dot)pub
sudo apt-get update
sudo apt-get -y install cuda
The last command seemed to give an error, the rest of the commands seemed to run fine without issue(dot) This was the output for the last command sudo apt-get -y install cuda
, which gave this output
`Reading package lists(dot)(dot)(dot) Done
Building dependency tree
Reading state information(dot)(dot)(dot) Done
Some packages could not be installed(dot) This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming(dot)
The following information may help to resolve the situation:
The following packages have unmet dependencies:
cuda : Depends: cuda-11-1 (>= 11(dot)1(dot)0) but it is not going to be installed
E: Unable to correct problems, you have held broken packages(dot)
In trying to troubleshoot the driver install, I found that sudo apt install nvidia-450-dev
might work instead, so I tried it, and it worked
nvidia-smi
Showed the following
Mon Oct 26 18:27:49 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450(dot)66 Driver Version: 450(dot)66 CUDA Version: 11(dot)0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp(dot)A | Volatile Uncorr(dot) ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M(dot) |
| | | MIG M(dot) |
|===============================+======================+======================|
| 0 GeForce RTX 208(dot)(dot)(dot) Off | 00000000:19:00(dot)0 Off | N/A |
| 22% 31C P8 1W / 250W | 6MiB / 11019MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208(dot)(dot)(dot) Off | 00000000:1A:00(dot)0 Off | N/A |
| 22% 35C P8 4W / 250W | 6MiB / 11019MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 208(dot)(dot)(dot) Off | 00000000:67:00(dot)0 Off | N/A |
| 22% 37C P8 6W / 250W | 6MiB / 11019MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 GeForce RTX 208(dot)(dot)(dot) Off | 00000000:68:00(dot)0 Off | N/A |
| 22% 39C P8 1W / 250W | 26MiB / 11016MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1314 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 1314 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 1314 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 1314 G /usr/lib/xorg/Xorg 9MiB |
| 3 N/A N/A 1653 G /usr/bin/gnome-shell 14MiB |
+-----------------------------------------------------------------------------+
However, the driver is for 11(dot)0, not 11(dot)1(dot)
So I then tried installing and old version of cuda, 11(dot)0 instead of 11(dot)1(dot)
This is only for the driver, and not the runtime API(dot)
Running nvcc -V
gives “bash: /usr/bin/nvcc: No such file or directory”
I then tried to install 11(dot)0, as the runtime API should be a lower or equal version than the driver version(dot)
From
://developer(dot)nvidia (dot)/cuda-11(dot)0-download-archive
I selected this install
://developer(dot)nvidia (dot)/cuda-11(dot)0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=runfilelocal
Which gave the following commands,
wget ://developer (dot)download (dot)nvidia (dot)/compute/cuda/11(dot)0(dot)2/local_installers/cuda_11(dot)0(dot)2_450(dot)51(dot)05_linux(dot)run
sudo sh cuda_11(dot)0(dot)2_450(dot)51(dot)05_linux(dot)run
After downloading the installer, running sudo sh cuda_11(dot)0(dot)2_450(dot)51(dot)05_linux(dot)run
First gave me a warning about a previous version being installed again, probably from the driver installation(dot) I selected to continue since I would only be installing the toolkit and not the driver(dot) I continued, and selected to install everything except for the Driver
CUDA Installer │
│ - [ ] Driver │
│ [ ] 450(dot)51(dot)05 │
│ + [X] CUDA Toolkit 11(dot)0 │
│ [X] CUDA Samples 11(dot)0 │
│ [X] CUDA Demo Suite 11(dot)0 │
│ [X] CUDA Documentation 11(dot)0 │
│ Options │
│ Install │
│ │
│ │
│
After the installation, I got this message
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11(dot)0/
Samples: Installed in /home/santosh/, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-11(dot)0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11(dot)0/lib64, or, add /usr/local/cuda-11(dot)0/lib64 to /etc/ld(dot)so(dot)conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11(dot)0/bin
Please see CUDA_Installation_Guide_Linux(dot)pdf in /usr/local/cuda-11(dot)0/doc/pdf for detailed information on setting up CUDA(dot)
***WARNING: Incomplete installation! This installation did not install the CUDA Driver(dot) A driver of version at least (dot)00 is required for CUDA 11(dot)0 functionality to work(dot)
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>(dot)run --silent --driver
Logfile is /var/log/cuda-installer(dot)log
I added /usr/local/cuda-11(dot)0/bin to PATH and set LD_LIBRARY_PATH to /usr/local/cuda-11(dot)0/lib64
I then attempted the post installation instructions here ://docs (dot)nvidia (dot)com/cuda/cuda-installation-guide-linux/index (dot)#power9-setup
systemctl status nvidia-persistenced
resulted in “Unit nvidia-persistenced(dot)service could not be found(dot)”
sudo systemctl enable nvidia-persistenced
resulted in
The unit files have no installation config (WantedBy, RequiredBy, Also, Alias
settings in the [Install] section, and DefaultInstance for template units)(dot)
This means they are not meant to be enabled using systemctl(dot)
Possible reasons for having this kind of units are:
1) A unit may be statically enabled by being symlinked from another unit's
(dot)wants/ or (dot)requires/ directory(dot)
2) A unit's purpose may be to act as a helper for some other unit which has
a requirement dependency on it(dot)
3) A unit may be started when needed via activation (socket, path, timer,
D-Bus, udev, scripted systemctl call, (dot)(dot)(dot))(dot)
4) In case of template units, the unit is meant to be enabled with some
instance name specified(dot)
I was able to do the udeve rule instructions without issue; I ran the following commands
sudo cp /lib/udev/rules(dot)d/40-vm-hotadd(dot)rules /etc/udev/rules(dot)d
sudo sed -i '/SUBSYSTEM=="memory", ACTION=="add"/d' /etc/udev/rules(dot)d/40-vm-hotadd(dot)rules
I tried nvcc -V
just to check if the installation somehow worked otherwise(dot) This time I got this message
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit
So I tried the command, and it seemed to install with no issues(dot) When I ran nvcc -V
again, I got this message
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10(dot)0, V10(dot)0(dot)130
Which is the version of CUDA that I started with(dot)
Looking at this message
://forums (dot)developer (dot)nvidia (dot)com/t/cuda-10-installation-problems-on-ubuntu-18-04/68615
follow the instructions in the linux install guide: ://docs (dot)nvidia (dot)/cuda/cuda-installation-guide-linux/index(dot)html 836
get your installers from ://(dot)nvidia (dot)com/getcuda 267
Now that you’ve already installed the wrong drivers, read the linux install guide carefully(dot) Failure to follow it carefully will result in more trouble(dot)
It seems that the alternative ways on installing onto the gpu and toolkit (with sudo apt install nvidia-450-dev
and sudo apt install nvidia-cuda-toolkit)
)are not recommended, and that the instruction guide should be followed exactly(dot)
However, I followed the instructions, and it was not able to install onto the driver(dot) Driver installation doesn’t seem impossible as the alternative command somehow worked, but the error log didn’t give me any insights into how I might be able to install it the official way(dot)