Nvidia-smi fails to load drivers on linux

Hi,

I have and NVIDIA Telsa M40 on a machine running linux mint 20.3.
The I’ve tried to install all possible drivers from apt. My latest clean install was

sudo apt-get install nvidia-driver-510

Obviously I’ve rebooted the os but anyhow when I try to run nvidia-smi I always get:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

so I’ve run the nvidia-bug-report.sh and attached the result here…

nvidia-bug-report.log.gz (398.4 KB)

1 Like

I have the same issue, I think it failed after the recent live update. My graphic card is Nvidia 3070 dual.

I recently changed my graphics card, maybe there are some conflict which I didn’t manage to resolve 😅

For the Tesla to work you need to enable “Above 4G decoding” in bios, disable CSM and have an EFI boot. Also, Teslas are built for servers, don’t have a fan so you need to have it added if you’re running this in a desktop case.

Thanks for the tip,

I’ll try to change the bios settings.

About the fan I’m printing an adaptor to mount on one end of the card, hope it will help to take away the heat !

Update, i’ve changed all the bios parameters but still it doesn’t work…

is there a way to understand if the card has a problem ? I can see that the card is inserted !

Please uninstall the driver and attach a dmesg output right after reboot.

Hello @generix, I tried to reply to you on a post on another topic /dev/sdb1 : clean, 640729/122388848… and Keyboard is not working - #17 by generix

However I could not reply as i was limited to 3 replies per topic as a new user. I edited my previous post on that topic to attach the bug report /dev/sdb1 : clean, 640729/122388848… and Keyboard is not working - #16 by abdulbaasitsanusi

Is it possible to continue to discussion elsewhere?

Generix, sorry for the late reply…

here is the output of dmesg
dmesg.out (184.8 KB)

64bit resources are still not enabled. Is this a plain old bios or an uefi with csm enabled?

mmm strange, I’m running a 64bit linux system, or am I confusing things ?

The bios was just updated to the latest version it supports UEFI and has csm disabled.
The motherboard is pretty old (from 2012 Asus motherboard).

If this is an uefi board, then you still have csm enabled because the linux install uses a mbr boot. So you will also have to reinstall it after really disabling csm.

64bit resources have nothing to do with the OS, it’s provided by the bios (after enabling “Above 4G decoding/ 64bit BARs”).
The CSM in modern UEFI firmwares is very limited, not capable of much.

so it may be possible that the board is not capable of above 4G decoding. but it sounds strange to me, because I had mounted on the same machine an rtx3060…

Teslas want to map their whole video memory into system address range, 24GB obviously needs 64bit address space.
Normal graphics cards like the 3060 only map 256MB (unless the bios supports rBAR).
Does the bios have a 4G option?

Unfortunately no, I didn’t find that option… I should test the M40 on a board with that option…

I checked the board’s manual and this looks like some very early uefi/bios hybrid. Doesn’t support any Tesla, no dice.

I finally managed to test the GPU on a newer motherboard and IT WORKS.

thanks for guiding me towards the solution!