I am trying to do a clean uninstall / install of CUDA on a Linux VM, following these general steps, but I cannot get nvidia-smi to work after doing so:
# remove cuda toolkit; https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#removing-cuda-toolkit-and-driver
sudo apt-get --purge remove "*cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" \
"*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*"
# remove drivers
sudo apt-get --purge remove "*nvidia*" "libxnvctrl*"
# cleanup
sudo apt-get autoremove
# headers
sudo apt-get install linux-headers-$(uname -r)
# check distro version
uname -m && cat /etc/*release
# set os / arch
OS=ubuntu2004
arch=x86_64
# network repo install https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#network-repo-installation-for-ubuntu
wget https://developer.download.nvidia.com/compute/cuda/repos/${OS}/${arch}/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get install cuda -y
sudo apt-get install nvidia-gds -y
sudo reboot
# post install
export PATH=/usr/local/cuda-12.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.0/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
# check install
cat /proc/driver/nvidia/version # no such file
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver
Now with nvidia-bug-report.sh
nvidia-bug-report.log.gz (145.2 KB)