I ran into this problem, but it had nothing to do with CUDA (which wasn’t installed on some of the systems). On my system the kernel modules were being embedded inside the compressed kernel image, then being loaded early in the boot process. These embedded, but outdated modules, would then prevent the correct, and newly installed/compiled standalone module files from being loaded. You can confirm this issue easily. Check the following:
cat /proc/driver/nvidia/version
cat /sys/module/nvidia/version
If the loaded modules loaded don’t match the driver version, you could also be facing this problem. Assuming the correct kernel modules are available, which you can confirm by running (assuming your distro uses DKMS):
dkms status
For me the fix simply involved regenerating my kernel images. On Red Hat distros, and its derivatives (Fedora, CentOS, Alma, Rocky, Oracle, etc) you can run:
(rpm -q --qf="%{VERSION}-%{RELEASE}.%{ARCH}\n" --whatprovides kernel ; uname -r) | \
sort | uniq | while read KERNEL ; do
dracut -f "/boot/initramfs-${KERNEL}.img" "${KERNEL}" || exit 1
done
This will regenerate the image for every installed kernel. For the equivalent logic on Debian distros, and its derivatives (including Ubuntu), you can run:
for kernel in /boot/config-*; do
[ -f "$kernel" ] || continue
KERNEL=${kernel#*-}
mkinitramfs -o "/boot/initrd.img-${KERNEL}.img" "${KERNEL}" || exit 1
done
Then reboot. You can also fix the problem temporarily, by manually removing (unloading) the NVIDIA module using rmmod or modprobe, then reloading them. When you do modprobe will use the standalone kernel module which should match your installed driver version.
P.S. I hit this issue when I upgraded from the 470.x driver, to the 510.x driver, which recently became the reccomended, stable, install version. I never ran into this problem while using the 460.x and 470.x driver releases.