My team and I have configured Cuda on our Ubuntu server to work with our 4 A100 GPUs.
We are running on Ubuntu 22.04 with:
Kernel version 5.15.0-86-generic
NVIDIA-SMI - Version 535.104.12
Driver - Version 535.104.12
CUDA - Version 12.2
We are trying to upgrade to kernel major version 6 and specifically to version 6.2.0-26 since that is reccomended in the Cuda documentation.
We tried upgrading an everytime we do the kernel works fine until we install nvidia driver 535. After installing it and restarting we cant even log into the machine. We tried all sorts of things to get it working but no success.
We also tried Kernel version 6.5 Nvidia 535 but we could not get NVIDIA-SMI to recognize a driver.
We spent 3 days trying to upgrade with no success. Anyone know anything about this?