CUDA Error when starting machine post suspension

Hello,
I am getting the below error when I suspend my Ubuntu machine and restart it. I am currently using 20.04 and my Graphics card is GEFORCE RTX 3080.

File “rmm/_cuda/gpu.pyx”, line 134, in rmm._cuda.gpu.getDeviceCount
rmm._cuda.gpu.CUDARuntimeError: cudaErrorUnknown: unknown error

Any help would be appreciated. Thank you!

I guess you either have to set up video object persistence https://download.nvidia.com/XFree86/Linux-x86_64/440.64/README/powermanagement.html
or unload and reload the nvidia-uvm module on resume.

Thank you @generix for the response. I did try the second option. However, when I run sudo rmmod nvidia_uvm it gives rmmod: ERROR: Module nvidia_uvm is in use and using force sudo rmmod -f nvidia_uvm it gives rmmod: ERROR: …/libkmod/libkmod-module.c:799 kmod_module_remove_module() could not remove ‘nvidia_uvm’: Resource temporarily unavailable
rmmod: ERROR: could not remove module nvidia_uvm: Resource temporarily unavailable.

Any suggestions on this?

Sounds like you had a cuda job or an application using cuda on suspend, you will have to kill that in order to be able to unload the uvm module. otherwise, you’ll have to try using option 1.

Essentially I had to close everything (including my browser - I didn’t know it was using CUDA) and than was able to run those commands of option 1. I think CUDA came back up after that. Any recommendations you have to automate it once I log into the computer after the suspension or if any other option than to suspend. Thanks @generix .

Yeah, didn’t think of hw accel for video decode in chrome/electron apps. Those likely hold on to the uvm module as well.
Latest drivers have part of the pm methods set as default, maybe try adding the graphics drivers ppa to get the latest driver and check if that works better with cuda on suspend.

1 Like

Is this a good reference to install - How to Use Ubuntu Nvidia PPA ?
Thanks!

Yes, explained so even beginners should get it right.