NVIDIA-SMI couldn't communicate with the NVIDIA driver

vamshi2 · March 7, 2022, 11:33am

I tried almost all the posts in this forum. But still I can’t make it work.
I am using ESXi 7.0 Update2 with the latest NVIDIA GPU Manager(vib). The guest os on OS in Ubuntu 20.04.4. The GPU is Tesla M60. I have tried nvidia-driver-510,470,460. But none seems to work. I tried with gcc-9 and gcc-7 also. The below is the current info. Bug report is attached at the end. I have wasted 3 days on this issue. Any help is much much appreciated. Thanks!

$ dkms status
nvidia, 470.103.01, 5.13.0-30-generic, x86_64: installed

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-6ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.5.0 (Ubuntu 7.5.0-6ubuntu2)

$ prime-select query
nvidia

$ sudo nvidia-settings 
Unable to init server: Could not connect: Connection refused
ERROR: The control display is undefined; please run `nvidia-settings --help` for usage information.

nvidia-bug-report.log.gz (581.3 KB)

vamshi2 · March 8, 2022, 4:32am

Can someone please help?

sschaber · March 8, 2022, 7:32am

Hi,
first of all, can you run nvidia-smi from the host? What are you trying to achieve? Passthrough or vGPU? Are you using a supported hardware for the M60? Which profile did you assign to the VM if using vGPU? Can you start the VM with the profile assigned?
The bug report is meant for NV enterprise support so please don’t expect that someone in the forum will analyze the bug report.

vamshi2 · March 8, 2022, 7:57am

Hi, thanks for the response @sschaber.
We are using NVIDIA grid. Yes the harware supports it.

I forgot to mention that it’ working fine with Windows VMs.

Yes, on host nvidia-smi is working.

sschaber · March 8, 2022, 8:12am

As you are using vGPU 13.0 on the host, you should use 470.63.01 driver for the Linux guest. How many system memory did you assign to the linux VM? Sometimes you need to add the MMIO parameters to the VM config. See LaunchPad | NVIDIA Docs as an example how to properly setup a Linux guest

vamshi2 · March 8, 2022, 1:01pm

Yes. I have tried with 510, 470.103 and 460 drivers. But installing 470.63 drivers on the guest did the trick. That means the host and guest must have the same driver versions.
Thank you so much @sschaber

system · March 22, 2022, 1:01pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nvidia-smi gives "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver." Linux	5	481	March 18, 2024
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running Linux ubuntu , driver , nvidia-smi	8	2701	December 15, 2022
Can't use nvidia-smi on VM General Discussion ubuntu , nvidia-smi	3	1823	March 30, 2023
Nvidia-smi “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure ..." Linux	11	2574	January 10, 2022
Driver Error: NVIDIA-SMI failed because it couldn't communicate with the NVIDA Driver Linux cuda , kernel , ubuntu , vmware-solutions	7	594	June 12, 2024
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running Linux	10	525	December 20, 2022
Nvidia-smi not working - VMWare ESXI Ubuntu Server 20.04.04 with Tesla V100 Linux kernel , ubuntu	3	1113	December 8, 2022
Problem installing NVIDIA driver Linux	14	1422	July 12, 2022
VIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running Linux ubuntu , linux	5	63506	March 8, 2023
Ubuntu 20.04 Nvidia-smi didnt work Linux ubuntu	4	32065	December 31, 2023

NVIDIA-SMI couldn't communicate with the NVIDIA driver

Related topics