Desperately trying to install GDS on a vanilla Ubuntu 22.04. And it just seem impossible. Every webpage from Nvidia is just wrong, endlessly.
Had an successful “installation” (but not a successful usage) using various NVidia source documentation.
gdscheck.py -f
gdscheck.py -p
Showed that all was nearly good, but that NVME was “unsupported”.
- Problem was that drivers would not load, as they used GPL symbols, and this was the closed binary install.
Solution was, remove existing drivers package, kernel source package, and cuda drivers.
Then install “open compute branch” of solution:
- Had conflicting GPG keys thanks to Nvidia. (Eventually solved that with manual deleting from several apt control directories so that “sudo apt update” could once again function.
Then followed Nvidia 22.04 cuda installation instructions:
From this website:
And follow the “open kernel module flavor” since that is required for Ubuntu 22.04 and GDS.
But there is no:
cuda-drivers-550 package… in existence.
GPG keys setup.
Repository references good.
The recommendation is this sequence:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.0-550.54.14-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.0-550.54.14-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4
AND THEN
sudo apt-get install -y nvidia-driver-550-open
sudo apt-get install -y cuda-drivers-550 ### THIS STEP FAILS
E: Unable to locate package cuda-drivers-550
Adding:
sudo apt-get install --verbose-versions nvidia-kernel-source-550-open
And repeating:
sudo apt-get install -y cuda-drivers-550 ### THIS STEP FAILS
E: Unable to locate package cuda-drivers-550
Does NOT help. The 550 version of the open kernel just does not exist. But this is NVIDIA's own recomendation from the website above (e.g. current, hot off the press).
Cannot install using "old" boiler plate "legacy" binary installation, those NO LONGER WORK in the modern kernel world, even on a not that modern:
$ uname -r
5.15.0-1046-nvidia
$ sudo hostnamectl
Static hostname: hawkfish
Icon name: computer-server
Chassis: server
Machine ID: 22484ed144c74b409c4ef8585f2d0130
Boot ID: f1d72aed470a41b280dddb3e5738555c
Operating System: Ubuntu 22.04.2 LTS
Kernel: Linux 5.15.0-1046-nvidia
Architecture: x86-64
Hardware Vendor: Supermicro
Hardware Model: SYS-1019GP-TT
The product is totally uninstallable. Its like Nvidia does even attempt to really check/install these combinations. Never seen a set of web pages so completely wrong, errors on every single page.
Been at this for over TWO WEEKS. (And installation, seriously.) Beyond frustrated.
There has got to be a proper set of kernel, kernel-source, and CUDA drivers, and GDS driver stack that actually works on 22.04. (Your own website pretends this works.) But its literally uninstallable.