We have recently purchased several A100 GPUs and an Epyc-based server to run them in. My first choice of OS, mainly due to its superior support for ZFS but also because it doesn’t use snap packages (for LXD, most importantly here), is the Debian 12 based Proxmox 8.
I have been successful in installing the Debian 11 cuda repo packages under Proxmox 8 but unfortunately its cuda drivers are too old to support the A100.
I have tried installing cuda using the Ubuntu 22.04 cuda repo under proxmox 8 but I get the error:
The following packages have unmet dependencies:
cuda-drivers-535 : Depends: nvidia-settings (>= 535.86.10) but it is not installable
E: Unable to correct problems, you have held broken packages.
I prefer to use repos to install third party software but I’ve also tried using the runfile to install cuda and that doesn’t work either:
root@thor:/home/sgs548# sh ./cuda_12.2.1_535.86.10_linux.run
Installation failed. See log at /var/log/cuda-installer.log for details.
root@thor:/home/sgs548# cat /var/log/cuda-installer.log
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc
[INFO]: gcc version: gcc version 12.2.0 (Debian 12.2.0-14)
[INFO]: Initializing menu
[INFO]: nvidia-fs.setKOVersion(2.17.3)
[INFO]: Setup complete
[INFO]: Installing: Driver
[INFO]: Installing: 535.86.10
[INFO]: Executing NVIDIA-Linux-x86_64-535.86.10.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed. Consult the driver log at /var/log/nvidia-installer.log for more details.
[ERROR]: Install of 535.86.10 failed, quitting
When can we expect cuda packages for Debian 12?
Has anyone been successful in getting the A100 running under Debian Bookworm or proxmox 8? If so, how?
Thanks