Hi ,
I am on ubuntu 20.04 and after a recent update to kernel 5.15.0-69-generic, the GPU driver was not recognized. I create a bug report : nvidia-bug-report.log (2.2 MB)
After some research, I manage to make it work using prime-select nvidia. However, this solution make impossible to use the 2 GPU I have (intel and Nvidia). Indeed before the update my nvidia GPU was only use for Deep Learning and the intel for rendering. After I use prime prime select nvidia, intel was never use.
I also try prime-select on-demand but in this case my nvidia GPU was not working…
I pos in ht fist comment a second bug report from the situation prime-select nvidia.
Any help would be appreciated :)
Did you install the NVIDIA driver before or after the kernel upgrade? Or did the kernel upgrade initiate an automatic driver update? The latter is quite often a bit problematic and you should do a manual re-install instead.
On first look at your logs I can only see two entries that might be suspicious. Check these parts in your logs:
warning: the compiler differs from the one used to build the kernel
The kernel was built by: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
You are using: cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
While this should not matter I would still check if cc is simply a symlink to gcc or not and if necessary re-install the driver after fixing this.
Secondly:
avril 06 16:05:50 robin-Precision-7560 nvidia-persistenced[22122]: device 0000:01:00.0 - persistence mode disabled.
avril 06 16:05:50 robin-Precision-7560 nvidia-persistenced[22122]: device 0000:01:00.0 - NUMA memory offlined.
avril 06 16:05:50 robin-Precision-7560 nvidia-persistenced[22122]: PID file unlocked.
avril 06 16:05:50 robin-Precision-7560 nvidia-persistenced[22122]: PID file closed.
avril 06 16:05:50 robin-Precision-7560 nvidia-persistenced[22122]: Shutdown (22122)
persistenced is not running. That also indicates an issue with the driver installation.
Could you clarify one thing for me please?
How did you verify this? ´nvidia-smi´did not show an issue in the second log. Could you do prime-select on-demand, reboot and then add the output of a plain call to nvidia-smi here?
Thanks for your answer, and sorry for the delay I was offline last week.
In my case, the kernel update did not initiate an automatic update of the driver but after the kernel update, the GPU driver was not working anymore. Then, I try to manually re-install the recommended version (nvidia-driver-525-open) but it did not improve my situation so I re-installed the nvidia-driver-510 and it enable me to use my GPU. Still, I could not specify the usage (intel for everything except deep learning) and nvidia for deep learning.
I would still check if cc is simply a symlink to gcc
How would you do that?
How did you verify this? ´nvidia-smi´did not show an issue in the second log
On the second log, I used prime-select nvidia because the prime-select on-demand did not give me any results.
I follow your advice and it worked after the reboot on` prime-select on-demand the two GPUs are correctly used.
Here is a last bug report: nvidia-bug-report.log (2.7 MB)
However, after that, all my python environment are broken and even if Toch or Tensorflow detect the GPU they cannot use.
with torch:
ValueError: GPU is not accessible. Was the library installed correctly?
with tensorflow :
2023-04-24 11:29:21.442756: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
For the CUDA issue the reason might be that you have changed the driver from 525 to 510 in between. Possibly there is a CUDA toolkit mismatch at that point.
Cleaning this up might be a bit complicated. The best would be to remove CUDA completely and freshly install the toolkit following the installation instructions.
Thanks for the reply!
I did a full upgrade to Ubuntu 22.04 and make a fresh install with new driver, cuda, cudnn, etc.
Now this is working fine :)
Thanks for the help !