Nvidia-smi fails on various google cloud VM's with Tesla K80 GPU

I have followed the guide on Google Cloud using Ubuntu 18 and 20 ( have also tried Ubuntu Lite , Debian and Centos 7 ):

Unfortunately, after completing the lengthy install I get this:

me@gpu:~$ nvidia-smi NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

Have tried installing via the script and via the direct downloads from Nvidia site for Cuda 10.

I have also tried some of things recommended here with no luck:

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

I was able to get it working. The mistake I was making was not doing the pre-installation steps before running the cuda_10.1.243_418.87.00_linux.run script. I was under the impression the *.run file would do everything for me. It would help if users were told they MUST do the pre-installation steps. Specifically I had to do this for Ubuntu 18:

sudo nano /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
sudo update-initramfs -u

This seems like a bit of a “hack”, so not sure why nvidia can’t make the installation process more robust? They make a bazillion of these cards. It’s not like some homemade product with a niche user base…

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.