Nvidia modprobe not found after power outage

NeuroSurfer · October 15, 2015, 6:50pm

There was a recent thunderstorm in the city I’m living, and there was a city-wide power outage. Luckily my computer and many other were connected to stabilizers, but that didn’t stop them from powering out.

Today I turned all of the computers back up again, but the small cluster computer that I have that has an Nvidia Titan X rebooted perfectly but it couldn’t seem to run any CUDA related operations (it was running some during the power outage). I can still add the cuda path and it detects the nvcc, however when I run a matlab script that calls the graphics card for cuda related processes, I get a “FATAL error: nvidia modprobe” not found. Not really sure what happened. It seems like when I go to the /dev/ folder, the nvidia device doesn’t seem to show up.

However, when I type

$lspci | grep VGA
VGA compatible controller: NVIDIA Corporation Device 17c2 (rev a1)
VGA compatible controller: Matrox Electronics Systems LTd. MGA G200e

So I’m not really sure if the system detects the graphics card… Not really sure what the next step is here, should I open the computer and check if the graphics card didn’t melt from the power outage? Is there anyway via the terminal to make any checks of the current status (hardware + software) of my graphics card?

Other details:
the $./deviceQuery command from the CUDA Samples Toolbox outputs:
modprobe: FATAL : Module nvidia not found.
cudaGetDeviceCount returned 38
→ No CUDA-capable device is detected.
Result = FAIL

My second option is to re-install the graphics card, but at this point I don’t even know if it’s working or not…

Robert_Crovella · October 15, 2015, 7:31pm

If you have root privilege try running the ./deviceQuery as root.

If that doesn’t work, then try reinstalling the GPU driver (or all of CUDA).

You may also want to take a look at (as root):

dmesg | grep NVRM

to see if there is anything interesting there.

NeuroSurfer · October 15, 2015, 7:49pm

Doesn’t work, I get the same error.

I ran the command, nothing pops up. Is this ok?

Robert_Crovella · October 15, 2015, 7:58pm

No, it’s not OK. Best case would be to see something like this:

[   15.227592] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  352.39  Fri Aug 14 18:09:10 PDT 2015

If you don’t even see that, it means the driver is not even attempting to load.

You at least have a corrupted system config. I would try reinstalling the driver.

NeuroSurfer · October 15, 2015, 8:01pm

Ok, will try reinstalling the driver now. Will keep you posted

NeuroSurfer · October 15, 2015, 8:12pm

Driver re-installed from here:
https://devtalk.nvidia.com/default/topic/878117/cuda-setup-and-installation/-solved-titan-x-for-cuda-7-5-login-loop-error-ubuntu-14-04-/

Everything is working fine now. Thanks!

Topic		Replies	Views
Unable to detect CUDA-capable device after automatic/forced NVIDIA updated CUDA Setup and Installation	4	10874	December 2, 2015
ERR_NVGPUCTRPERM when using nv-nsight-cu-cli for profilling CUDA Setup and Installation	0	389	May 22, 2022
Installing CUDA 10 on Ubuntu 18 with Titan X card CUDA Setup and Installation	3	2758	January 2, 2019
nvidia-uvm not found 340.24 6.5 rc ubuntu 14.04 64 bit kernel 3.13 Linux	4	13157	November 4, 2014
no CUDA-capable device is detected CUDA Setup and Installation	3	2658	October 12, 2021
Card or driver seem to be inaccessible (Ubuntu 14.04) CUDA Setup and Installation	8	1519	September 4, 2016
no CUDA-capable device is detected CUDA Setup and Installation	6	13870	July 6, 2016
modprobe: ERROR: could not insert 'nvidia_352': Required key not available CUDA Setup and Installation	2	3886	October 13, 2016
No Cuda Capable devices are detected in Nvidia Titan Xp Linux	3	1740	October 14, 2021
Fedora 12 - GeForce 9800 GTX+ CUDA Programming and Performance	7	5218	February 15, 2010

Nvidia modprobe not found after power outage

Related topics