Hello,
I am deploying CUDA for a client on their vast.ai GPU servers. They all are running Ubuntu 16.04.
I am following the instructions on the download page to install CUDA:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-ubuntu1604.pin
mv cuda-ubuntu1604.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda-repo-ubuntu1604-11-2-local_11.2.0-460.27.04-1_amd64.deb
dpkg -i cuda-repo-ubuntu1604-11-2-local_11.2.0-460.27.04-1_amd64.deb
apt-key add /var/cuda-repo-ubuntu1604-11-2-local/7fa2af80.pub
apt-get update
apt-get -y install cuda
However when I execute the last step I get the following errors:
Errors were encountered while processing:
/tmp/apt-dpkg-install-MFToHc/189-nvidia-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-MFToHc/191-libcuda1-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-MFToHc/193-nvidia-opencl-icd-460_460.27.04-0ubuntu1_amd64.deb
W: Sources disagree on hashes for supposely identical version '11.2.0-1' of 'cuda-libraries-11-2:amd64'.
W: Sources disagree on hashes for supposely identical version '11.2.0-1' of 'cuda-libraries-11-2:amd64'.
W: Sources disagree on hashes for supposely identical version '11.2.0-1' of 'cuda-nsight-systems-11-2:amd64'.
W: Sources disagree on hashes for supposely identical version '11.2.0-1' of 'cuda-nsight-systems-11-2:amd64'.
W: Sources disagree on hashes for supposely identical version '11.2.0-1' of 'cuda-tools-11-2:amd64'.
W: Sources disagree on hashes for supposely identical version '11.2.0-1' of 'cuda-tools-11-2:amd64'.
W: Sources disagree on hashes for supposely identical version '11.2.0-1' of 'cuda-11-2:amd64'.
W: Sources disagree on hashes for supposely identical version '11.2.0-1' of 'cuda-11-2:amd64'.
E: Sub-process /usr/bin/dpkg returned an error code (1)
So to fix this I ran apt-get --fix-broken install. However this makes a different set of errors:
dpkg: error processing archive /var/cuda-repo-ubuntu1604-11-2-local/./nvidia-460_460.27.04-0ubuntu1_amd64.deb (--unpack):
trying to overwrite '/usr/lib/x86_64-linux-gnu/libGLX_indirect.so.0', which is also in package libglx-mesa0:amd64 20.0.8-0ubuntu1~18.04.1
Preparing to unpack .../libcuda1-460_460.27.04-0ubuntu1_amd64.deb ...
Unpacking libcuda1-460 (460.27.04-0ubuntu1) ...
dpkg: error processing archive /var/cuda-repo-ubuntu1604-11-2-local/. /libcuda1-460_460.27.04-0ubuntu1_amd64.deb (--unpack):
unable to make backup link of './usr/lib/x86_64-linux-gnu/libcuda.so.460.27.04' before installing new version: Invalid cross-device link
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Preparing to unpack .../nvidia-opencl-icd-460_460.27.04-0ubuntu1_amd64.deb ...
Unpacking nvidia-opencl-icd-460 (460.27.04-0ubuntu1) ...
dpkg: error processing archive /var/cuda-repo-ubuntu1604-11-2-local/./nvidia-opencl-icd-460_460.27.04-0ubuntu1_amd64.deb (--unpack):
unable to make backup link of './usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.460.27.04' before installing new version: Invalid cross-device link
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Errors were encountered while processing:
/var/cuda-repo-ubuntu1604-11-2-local/./nvidia-460_460.27.04-0ubuntu1_amd64.deb
/var/cuda-repo-ubuntu1604-11-2-local/./libcuda1-460_460.27.04-0ubuntu1_amd64.deb
/var/cuda-repo-ubuntu1604-11-2-local/./nvidia-opencl-icd-460_460.27.04-0ubuntu1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
I fixed the first error about nvidia-460_460.27.04-0ubuntu1_amd64.deb
by running apt-get -o Dpkg::Options::="--force-overwrite" install --fix-broken
as recommended here. But this does not fix the other two errors which still return Invalid cross-device link.
I have isolated the problem to:
root@C.706583:~$ dpkg -i --force-overwrite /var/cuda-repo-ubuntu1604-11-2-local/./libcuda1-460_460.27.04-0ubuntu1_amd64.deb
dpkg: error processing archive /var/cuda-repo-ubuntu1604-11-2-local/./libcuda1-460_460.27.04-0ubuntu1_amd64.deb (--install):
unable to make backup link of './usr/lib/x86_64-linux-gnu/libcuda.so.460.27.04' before installing new version: Invalid cross-device link
Apparently on all of the Vast.ai servers, there’s a second partition that’s mounted at /usr/bin/nvidia-smi, even though it’s a regular file.
root@C.706582:~$ df -h
Filesystem Size Used Avail Use% Mounted on
overlay 331G 136K 331G 1% /
tmpfs 64M 0 64M 0% /dev
tmpfs 63G 0 63G 0% /sys/fs/cgroup
shm 8.0G 0 8.0G 0% /dev/shm
/dev/sda4 3.7T 206G 3.5T 6% /etc/hosts
tmpfs 63G 12K 63G 1% /proc/driver/nvidia
/dev/sda2 21G 12G 8.0G 59% /usr/bin/nvidia-smi
udev 63G 0 63G 0% /dev/nvidia0
tmpfs 63G 0 63G 0% /proc/asound
tmpfs 63G 0 63G 0% /proc/acpi
tmpfs 63G 0 63G 0% /proc/scsi
tmpfs 63G 0 63G 0% /sys/firmware
To my understanding, invalid cross-device link errors happen when you try to hard link a file from one filesystem to another.
Is there any way around this issue to successfully install CUDA Toolkit 11.2?