There was a recent thunderstorm in the city I’m living, and there was a city-wide power outage. Luckily my computer and many other were connected to stabilizers, but that didn’t stop them from powering out.
Today I turned all of the computers back up again, but the small cluster computer that I have that has an Nvidia Titan X rebooted perfectly but it couldn’t seem to run any CUDA related operations (it was running some during the power outage). I can still add the cuda path and it detects the nvcc, however when I run a matlab script that calls the graphics card for cuda related processes, I get a “FATAL error: nvidia modprobe” not found. Not really sure what happened. It seems like when I go to the /dev/ folder, the nvidia device doesn’t seem to show up.
However, when I type
$lspci | grep VGA
VGA compatible controller: NVIDIA Corporation Device 17c2 (rev a1)
VGA compatible controller: Matrox Electronics Systems LTd. MGA G200e
So I’m not really sure if the system detects the graphics card… Not really sure what the next step is here, should I open the computer and check if the graphics card didn’t melt from the power outage? Is there anyway via the terminal to make any checks of the current status (hardware + software) of my graphics card?
the $./deviceQuery command from the CUDA Samples Toolbox outputs:
modprobe: FATAL : Module nvidia not found.
cudaGetDeviceCount returned 38
-> No CUDA-capable device is detected.
Result = FAIL
My second option is to re-install the graphics card, but at this point I don’t even know if it’s working or not…