NVIDIA-SMI couldn't communicate with the NVIDIA driver

I tried almost all the posts in this forum. But still I can’t make it work.
I am using ESXi 7.0 Update2 with the latest NVIDIA GPU Manager(vib). The guest os on OS in Ubuntu 20.04.4. The GPU is Tesla M60. I have tried nvidia-driver-510,470,460. But none seems to work. I tried with gcc-9 and gcc-7 also. The below is the current info. Bug report is attached at the end. I have wasted 3 days on this issue. Any help is much much appreciated. Thanks!

$ dkms status
nvidia, 470.103.01, 5.13.0-30-generic, x86_64: installed
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-6ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.5.0 (Ubuntu 7.5.0-6ubuntu2) 
$ prime-select query
nvidia
$ sudo nvidia-settings 
Unable to init server: Could not connect: Connection refused
ERROR: The control display is undefined; please run `nvidia-settings --help` for usage information.

nvidia-bug-report.log.gz (581.3 KB)

Can someone please help?

Hi,
first of all, can you run nvidia-smi from the host? What are you trying to achieve? Passthrough or vGPU? Are you using a supported hardware for the M60? Which profile did you assign to the VM if using vGPU? Can you start the VM with the profile assigned?
The bug report is meant for NV enterprise support so please don’t expect that someone in the forum will analyze the bug report.

Hi, thanks for the response @sschaber.
We are using NVIDIA grid. Yes the harware supports it.

I forgot to mention that it’ working fine with Windows VMs.

image

image

Yes, on host nvidia-smi is working.

As you are using vGPU 13.0 on the host, you should use 470.63.01 driver for the Linux guest. How many system memory did you assign to the linux VM? Sometimes you need to add the MMIO parameters to the VM config. See Step #2: Create Your First NVIDIA AI Enterprise VM — LaunchPad documentation as an example how to properly setup a Linux guest

1 Like

Yes. I have tried with 510, 470.103 and 460 drivers. But installing 470.63 drivers on the guest did the trick. That means the host and guest must have the same driver versions.
Thank you so much @sschaber

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.