Trying to install CUDA on a RHEL 8.7 machine with A40 GPUs using the below method:
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
sudo dnf clean all
sudo dnf -y module install nvidia-driver:latest-dkmssudo dnf -y install cuda
After rebooting, nvidia-smi generates the following error: NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
lsmod|grep -I nvidia returns nothing, as does lsmod|grep -i nouveau
cat /proc/version gives me this: Linux version 4.18.0-425.13.1.el8_7.x86_64 … (gcc version 8.5.0 20210514 (Red Hat 8.5.0-16) (GCC)) #1 SMP Thu Feb 2 13:01:45 EST 2023
and gcc -v gives me a matching version of gcc: gcc version 8.5.0 20210514 (Red Hat 8.5.0-16) (GCC)
Confirming I have Nvidia GPUs:
lspci|grep -I nvidia
17:00.0 3D controller: NVIDIA Corporation GA102GL [A40] (rev a1)
65:00.0 3D controller: NVIDIA Corporation GA102GL [A40] (rev a1)
ca:00.0 3D controller: NVIDIA Corporation GA102GL [A40] (rev a1)
e3:00.0 3D controller: NVIDIA Corporation GA102GL [A40] (rev a1)
I’m totally out of ideas here, I’ve looked through similar topics on this forum and I don’t think I’ve missed anything. I’ve also tried installing older drivers, using the run file, etc but I keep running into the same issue. Any advice would be very much appreciated.