NVRM: RmInitAdapter failed! (GTX 980 Ti & Ubuntu 14.04)

Hi,

I am trying to install cuda toolkit and nvidia driver on my machine. I did this allready for similar machine, it worked fine but that time, I can not find the solution.

I installed cuda toolkit 7.5 with the script downloaded from nvidia website. Everything was quite fine (except they say “unsupported distribution”, I never saw this before, don’t know why)

when I launch the program who is supposed to use GPU, I got :

  • this message from gromacs program :
    NOTE: Error occurred during GPU detection, no CUDA-capable device is detected

  • in kern.log :
    May 24 17:17:48 saul kernel: [ 147.008474] NVRM: RmInitAdapter failed! (0x53:0xffff:1952)
    May 24 17:17:48 saul kernel: [ 147.008509] NVRM: rm_init_adapter failed for device bearing minor number 0
    May 24 17:18:05 saul kernel: [ 164.805534] NVRM: RmInitAdapter failed! (0x30:0xffff:653)
    May 24 17:18:05 saul kernel: [ 164.805551] NVRM: rm_init_adapter failed for device bearing minor number 0

(I tried changing from PCI slot, same result.)

NOTE :
the hardware is totally new.
My screen is not plugged to GTX card, but to internal card from motherboard.

thanks very much for your help

maybe you need to remove the nouveau driver. maybe you haven’t plugged the necessary power into the GTX 980. Did you follow the instructions in the linux install guide carefully?

Thank you for your answer and for your help !

Yes, I followed carefully install guide. I did this on 3 computers allready, it works fine.
I plugged correctly the power, I also tried to plug it to other PCI slots.

I will try to remove the nouveau driver.

I removed nouveau driver and there is no more the bad message anymore ( RmInitAdapter failed! ).
My program stil doesn’t work but I still do not know if is CUDA toolkit related… I’m on it

Gromacs (program using gpu) said :

Running on 1 node with total 4 cores, 8 logical cores, 0 compatible GPUs
Hardware detected:
CPU info:
Vendor: GenuineIntel
Brand: Intel® Core™ i7-6700K CPU @ 4.00GHz
Family: 6 model: 94 stepping: 3
GPU info:
Number of GPUs detected: 1
#0: N/A, stat: insane

$ nvidia-smi
Wed May 25 09:18:12 2016
±-----------------------------------------------------+
| NVIDIA-SMI 364.19 Driver Version: 364.19 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 980 Ti Off | 0000:02:00.0 Off | N/A |
|ERR! 60C P2 ERR! / 260W | 125MiB / 6143MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1366 C gmx 103MiB |
±----------------------------------------------------------------------------+

Tail of kern.log :

May 25 09:12:33 saul kernel: [ 279.970623] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
May 25 09:12:33 saul kernel: [ 281.966826] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

I did a fresh install of ubuntu server 14.04

sudo apt-get install g+±4.8
sudo apt-get install gcc-4.8
sudo install make
sudo ln -s /usr/bin/gcc-4.8 /usr/bin/gcc
sudo ln -s /usr/bin/g+±4.8 /usr/bin/g++

  1. Install driver using NVIDIA-Linux-x86_64-361.42.run
  2. Install CUDA toolkit using cuda_7.5.18_linux.run (ANSWERING NO to not install included driver)

I launch nvidia-smi after reboot and got normal messaqe…and
few minutes later, I got error with the same command :

No devices were found. Please make sure /dev/nvidia* files are readable by current user.

it seems ok :

cyril@saul:~$ ls -lrt /dev/nvidia*
crw-rw-rw- 1 root root 195, 255 mai 25 14:13 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 0 mai 25 14:13 /dev/nvidia0
crw-rw-rw- 1 root root 242, 1 mai 25 14:23 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 242, 0 mai 25 14:23 /dev/nvidia-uvm

deviceQuery :

./deviceQuery Starting…
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

tail of kern.log, same message as before :

May 25 14:30:30 saul kernel: [ 1080.444903] NVRM: RmInitAdapter failed! (0x30:0xffff:680)
May 25 14:30:30 saul kernel: [ 1080.444912] NVRM: rm_init_adapter failed for device bearing minor number 0

I have also ACPI errors in kern.log, don’t know if it is related

tail -2000 kern.log | grep -i ACPI :

5 14:12:33 saul kernel: [ 0.691553] ACPI Error: No object attached to node [UHSD] ffff880473116be0 (20150619/exresnte-128)
May 25 14:12:33 saul kernel: [ 0.712355] ACPI Error: Method parse/execution failed [_SB_.PCI0.XHC_.RHUB.HS01.PLD] (Node ffff880473116c30), AE_AML_NO_OPERAND (20150619/psparse-536)
May 25 14:12:33 saul kernel: [ 0.734436] ACPI Error: No object attached to node [UHSD] ffff880473116be0 (20150619/exresnte-128)
May 25 14:12:33 saul kernel: [ 0.756302] ACPI Error: Method parse/execution failed [_SB
.PCI0.XHC_.RHUB.HS02.PLD] (Node ffff880473116c80), AE_AML_NO_OPERAND (20150619/psparse-536)
May 25 14:12:33 saul kernel: [ 0.779773] ACPI Error: No object attached to node [UHSD] ffff880473116be0 (20150619/exresnte-128)
May 25 14:12:33 saul kernel: [ 0.803074] ACPI Error: Method parse/execution failed [_SB
.PCI0.XHC_.RHUB.HS03.PLD] (Node ffff880473116cd0), AE_AML_NO_OPERAND (20150619/psparse-536)
May 25 14:12:33 saul kernel: [ 0.827903] ACPI Error: No object attached to node [UHSD] ffff880473116be0 (20150619/exresnte-128)
May 25 14:12:33 saul kernel: [ 0.852714] ACPI Error: Method parse/execution failed [_SB
.PCI0.XHC_.RHUB.HS04.PLD] (Node ffff880473116d20), AE_AML_NO_OPERAND (20150619/psparse-536)
May 25 14:12:33 saul kernel: [ 0.878799] ACPI Error: No object attached to node [UHSD] ffff880473116be0 (20150619/exresnte-128)
May 25 14:12:33 saul kernel: [ 0.904644] ACPI Error: Method parse/execution failed [_SB
.PCI0.XHC_.RHUB.HS05.PLD] (Node ffff880473116d70), AE_AML_NO_OPERAND (20150619/psparse-536)
May 25 14:12:33 saul kernel: [ 0.931927] ACPI Error: No object attached to node [UHSD] ffff880473116be0 (20150619/exresnte-128)
May 25 14:12:33 saul kernel: [ 0.959212] ACPI Error: Method parse/execution failed [_SB
.PCI0.XHC_.RHUB.HS06.PLD] (Node ffff880473116dc0), AE_AML_NO_OPERAND (20150619/psparse-536)
May 25 14:12:33 saul kernel: [ 0.987836] ACPI Error: No object attached to node [UHSD] ffff880473116be0 (20150619/exresnte-128)
May 25 14:12:33 saul kernel: [ 1.016243] ACPI Error: Method parse/execution failed [_SB
.PCI0.XHC_.RHUB.HS07.PLD] (Node ffff880473116e10), AE_AML_NO_OPERAND (20150619/psparse-536)
May 25 14:12:33 saul kernel: [ 1.045832] ACPI Error: No object attached to node [UHSD] ffff880473116be0 (20150619/exresnte-128)
May 25 14:12:33 saul kernel: [ 1.075342] ACPI Error: Method parse/execution failed [_SB
.PCI0.XHC_.RHUB.HS08.PLD] (Node ffff880473116e60), AE_AML_NO_OPERAND (20150619/psparse-536)
May 25 14:12:33 saul kernel: [ 1.106253] ACPI Error: No object attached to node [UHSD] ffff880473116be0 (20150619/exresnte-128)
May 25 14:12:33 saul kernel: [ 1.137158] ACPI Error: Method parse/execution failed [_SB
.PCI0.XHC_.RHUB.HS09.PLD] (Node ffff880473116eb0), AE_AML_NO_OPERAND (20150619/psparse-536)
May 25 14:12:33 saul kernel: [ 1.169348] ACPI Error: No object attached to node [UHSD] ffff880473116be0 (20150619/exresnte-128)
May 25 14:12:33 saul kernel: [ 1.201350] ACPI Error: Method parse/execution failed [_SB
.PCI0.XHC_.RHUB.HS10.PLD] (Node ffff880473116f00), AE_AML_NO_OPERAND (20150619/psparse-536)
May 25 14:12:33 saul kernel: [ 5.616208] ACPI Warning: _SB
.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
May 25 14:12:33 saul kernel: [ 5.691433] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no)

no idea ? :(

Follow the instructions in the linux install guide carefully.

Before you stated you did this, but then you discovered if you remove the nouveau driver, as coverred in the linux install guide, things got better.

Then when you decided to reinstall ubuntu, you got the nouveau driver back. And sure enough the NVRM messages reappeared.

I don’t think you’re following the linux install guide carefully. For ubuntu 14.04, it works.

In fact, usually I used deb package to instal cuda toolkit so I never disabled nouveau driver before…

I tried a new way with .run script for the first time, and forgot to disable nouveau driver

today, I did a fresh ubuntu install + cuda toolkit installation using .deb cuda toolkit and I changed the card from GTX 980 ti to “simple” GTX 980 and it works fine