Hi all,
in a disk-less cluster running CentOS 7 and hosting K80 cards, after an upgrade of the NVIDIA driver to 375.66 I got this error when trying to run nvidia-smi:
Failed to initialize NVML: Driver/library version mismatch
In the dmesg I found these errors:
NVRM: API mismatch: the client has the version 375.66, but
NVRM: this kernel module has the version 367.48. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
Looking for the loaded module version I get the expected version when issuing this command:
modinfo nvidia
version: 375.66
but I still get the old one looking in:
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.48 Sat Sep 3 18:21:08 PDT 2016
GCC version: gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC)
If I unload all nvidia related modules with rmmod and load them again with modprobe, everything works fine, but if I reboot a compute node /proc/driver/nvidia/version report again the old module version and the problem appears again.
In the whole machine I am not able to find the old kernel module file, so, how is it possible that the old module get loaded? Does someone has some ideas about how I could debug this problem?
Thanks and Best Regards,
Enrico