nVidia card has fallen off the bus

Iluvatar1 · April 15, 2013, 3:14pm

Dear all,
I am starting programming with CUDA but I am facing a very hard to fix problem: After some time the systems gives the error:
NVRM: GPU at 0000:03:00.0 has fallen off the bus
And the computer needs to be powered off to detect again the nVidia card.
At first I though it was a fault in my code: If I ran the same executable for 1000 times, the first 200 iterations were OK giving the same output, but then the system gave the aforementioned error and all the remaining iteration were giving errors. I then took the matrixMul example from cuda, compiled it, and ran it 1000 times. The same error happened around iteration 200!. That pointed me to some driver problem. Therefore, and unfortunately without any success, I tested the same procedure with:

Several drivers, some old (which google results stated could fix the problem), the latest long lived, the latest experimental, beta, etc.
Cuda 5 and cuda 4.2 with the aforementioned drivers
I booted on text only without
I removed xorgserver completely
Enabled persistent mode.
None of the previous worked.
Please remember the very simple test: I compile the matrixMul example and run the executable for 1000 times. I tested this also on my macbook pro and everything went fine (although of course different SO, card, etc). I am clueless right now. What I haven’t tested yet:
Another kernel version.
Another linux distribution.
This is my system info:
Ubuntu 12.04.2
Cuda 5
Current driver version : 313.30
Ubuntu kernel : 3.2.
g++ version : 4.6
nVidia Card : Quadro 4000 (GF 100)
I am compiling with the simple make command, following exactly the examples without modifying them.
Please, if you have any suggestion, let me know.
Thanks in advance.

vacaloca · April 16, 2013, 7:11pm

This sounds like it could be bad hardware… either bad motherboard, bad card, or bad power supply / not enough power supplied to the card. You’d have to see which exactly is the case, going through one of these issues at a time until the problem goes away…

ie:

try card with different power supply - works? done, problem is power supply, otherwise
try moving card from one slot of mobo to another - works? done, problem is mobo slot, otherwise
try moving card to another system - works? done, otherwise bad card or bad mobo

my guess is (1) is the issue… but please report back if you solve issue.

Topic		Replies	Views
GPU at 0000:02:00.0 has fallen off the bus. CUDA Programming and Performance	6	8964	November 28, 2011
NVRM: GPU at 0000:01:00.0 has fallen off the bus CUDA Programming and Performance	2	6919	September 1, 2011
CUDA 4 + driver 270.35 (C2050) random errors CUDA Programming and Performance	13	18722	April 7, 2011
kernel: [7766925.279896] NVRM: GPU at 0000:89:00.0 has fallen off the bus Linux	1	1036	November 18, 2016
Ubuntu 17.10, Nvidia 390.48, CUDA 9.1, GPU has fallen off the bus Linux	1	1938	April 24, 2018
There is no device supporting CUDA CUDA Programming and Performance	11	22751	April 24, 2008
Bad Cuda Card? CUDA Programming and Performance	10	7147	January 4, 2012
Quadro 6000 Problems on Ubuntu 11.10 CUDA Programming and Performance	2	3158	May 10, 2012
CUDA Error in multiples machines GPU has fallen under the bus CUDA Programming and Performance cuda	2	1202	June 18, 2021
Certain tests passing but others don't? CUDA Programming and Performance	4	5372	April 19, 2007

nVidia card has fallen off the bus

Related topics