Why nvidia drivers fail out of nowhere?

Argo_sa · April 13, 2024, 7:20am

I have a working ubuntu machine with nvidia drivers and libs to run neural networks with TensorRT and TritonInferenceServer. My machine was inferencing for several days and today I found out that during the night some issue occured. And the issue is that now nvidia-smi gives me this:

Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.171

And it was never the case, I’ve been using it for months. And obviously, I didn’t do anything, I was asleep. What is strange - this issue was fixed after reboot… Please tell me what is happening, why nvidia drivers are being unreliable for me? Why it happened out of nothing?

nvidia-bug-report.log.gz (170.9 KB)

And also, what is wrong with GPUtil.getGPUs(), why it fails even if it is covered in Try-Except in python? How to reliably parse GPU info (util, temp, vram)? And if drivers are for some reason down, how to catch error with GPUtil.getGPUs()?

My machine:
Ubuntu 22.04
NVIDIA-SMI 535.171.04
Driver Version: 535.171.04
CUDA Version: 12.2
cuDNN version: 8902
TensorRT version: 8.6.1

And screenshot after reboot with working nvidia smi:

AakankshaS · May 31, 2024, 9:00am

This might have happened if the driver have been updated automatically.
a reboot should fix that.

Topic		Replies	Views
Failed to initialize NVML: Driver/library version mismatch Linux ubuntu , cudnn	1	249	December 16, 2024
`Failed to initialize NVML: Driver/library version mismatch` occurs from time to time Drivers - Linux, Windows, MacOS nvidia-smi	4	1123	February 7, 2024
Failed to initialize NVML: Driver/library version mismatch NVML library version: 535.171 Linux graphics	4	3583	August 9, 2024
Failed to initialize NVML: Driver/library version mismatch CUDA Setup and Installation	5	34513	March 9, 2021
Cannot query NVIDIA drivers on Ubuntu - new issue cuDNN	3	1175	March 11, 2020
Nvidia-smi prints "Failed to initialize NVML: Driver/library version mismatch" (Ubuntu 20.04.5 LTS) Linux cuda , ubuntu , nvidia-smi	0	1178	March 5, 2023
Failed to initialize NVML: Driver/library version mismatch DeepStream SDK deepstream	3	246	October 8, 2024
Failed to initialize NVML: Driver/library version mismatch Linux kernel , ubuntu , nvbugs , linux , nvidia-smi , linux-driver-solutions , nvml	2	7797	April 5, 2024
Nvidia-smi failed to initialize NVML (driver/library version mismatch) Linux cuda , kernel , linux , driver , nvidia-smi , linux-driver-solutions	0	1327	September 26, 2023
Nvidia-smi results in Failed to initialize NVML: Driver/library version mismatch (Ubuntu 20.04.6 LTS) Linux	2	994	June 22, 2023

Why nvidia drivers fail out of nowhere?

Related topics