Why nvidia drivers fail out of nowhere?

I have a working ubuntu machine with nvidia drivers and libs to run neural networks with TensorRT and TritonInferenceServer. My machine was inferencing for several days and today I found out that during the night some issue occured. And the issue is that now nvidia-smi gives me this:

Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.171

And it was never the case, I’ve been using it for months. And obviously, I didn’t do anything, I was asleep. What is strange - this issue was fixed after reboot… Please tell me what is happening, why nvidia drivers are being unreliable for me? Why it happened out of nothing?

nvidia-bug-report.log.gz (170.9 KB)

And also, what is wrong with GPUtil.getGPUs(), why it fails even if it is covered in Try-Except in python? How to reliably parse GPU info (util, temp, vram)? And if drivers are for some reason down, how to catch error with GPUtil.getGPUs()?

My machine:
Ubuntu 22.04
NVIDIA-SMI 535.171.04
Driver Version: 535.171.04
CUDA Version: 12.2
cuDNN version: 8902
TensorRT version: 8.6.1

And screenshot after reboot with working nvidia smi: