NVRM: RmInitAdapter failed! and one GPU missing (Ubuntu 16.04 with 2 x 1080ti)

pablo.riera · August 3, 2018, 6:08pm

Hi, I have a system with two GPUs. Both were working fine, but ultimately, this error appear on the system log, and now there is only one GPU visible.

NVRM: RmInitAdapter failed! (0x26:0xffff:1123)
NVRM: rm_init_adapter failed for device bearing minor number 1

Here are some diagnostics, are there others that I could run? any information is appreciated. Thanks

17:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ZOTAC International (MCO) Ltd. Device 1470
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 33
	Region 0: Memory at c4000000 (32-bit, non-prefetchable) 
	Region 1: Memory at 387fe0000000 (64-bit, prefetchable) 
	Region 3: Memory at 387ff0000000 (64-bit, prefetchable) 
	Region 5: I/O ports at 7000 
	[virtual] Expansion ROM at c5000000 [disabled] 
	Capabilities: <access denied>
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_396, nvidia_396_drm
--
65:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ZOTAC International (MCO) Ltd. Device 1470
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 34
	Region 0: Memory at df000000 (32-bit, non-prefetchable) 
	Region 1: Memory at 38bfe0000000 (64-bit, prefetchable) 
	Region 3: Memory at 38bff0000000 (64-bit, prefetchable) 
	Region 5: I/O ports at b000 
	[virtual] Expansion ROM at e0000000 [disabled] 
	Capabilities: <access denied>
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_396, nvidia_396_drm

~ nvidia-smi
Fri Aug  3 15:00:16 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.45                 Driver Version: 396.45                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:17:00.0 Off |                  N/A |
| 23%   29C    P0    56W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvidia-bug-report.log.gz (2.24 MB)

pablo.riera · August 3, 2018, 6:41pm

I found elsewhere that these boot parameters where helpful, and indeed they where, but only lasted one reboot. It would be nice to know what are they doing to see if they could compromise the system.

pcie_aspm=off rcutree.rcu_idle_gp_delay=1

generix · August 3, 2018, 8:10pm

pcie_aspm=off in your case does nothing as the bios of your system doesn’t support aspm anyway.
rcutree.rcu_idle_gp_delay=1 lets the rcu mechanism wake up the cpu earlier. Has minimal impact on energy efficiency. Shouldn’t be necessary on a modern kernel.
You should check if your slot is working properly, reseat the card or change the slot.

Topic		Replies	Views
Detecting 1 of 2 GPUs: NVRM: RmInitAdapter failed Linux	3	2376	October 12, 2021
2 GPUs installed on system, but only 1 attached by nvidia-smi CUDA Setup and Installation	1	1239	March 23, 2019
NVRM: RmInitAdapter failed! Linux	5	5345	June 2, 2020
418.56, GTX 1050 TI mobile, Dell XPS, 4.19.34-1-lts, RmInitAdapter failed! Linux	8	1598	October 12, 2021
RmInitAdapter failed! on NVIDIA Corporation GA102 [GeForce RTX 3080 12GB] [10de:220a] Linux	4	987	April 20, 2022
"RmInitAdapter failed!" Dell R740XD and NVIDIA Tesla T4 Linux	1	256	April 24, 2024
3 devices (1 intel 2 nvidia) but nvidia-smi only shows 1 GPU CUDA Setup and Installation	1	1110	July 2, 2017
Ubuntu 16.04 GTX 1080TI can not run correctly Linux	1	234	October 22, 2023
"RmInitAdapter failed" with 378.13 Linux	7	5767	October 15, 2017
Problem with nvidia driver on Ubuntu: RmInitAdapter failed! (0x25:0xffff:1589) Linux ubuntu	0	223	July 24, 2024

NVRM: RmInitAdapter failed! and one GPU missing (Ubuntu 16.04 with 2 x 1080ti)

Related topics