Dear Ivan,
From the logs, it appeared that you are having issues with Asus RTX 2060 LHR cards.
Request you to check with Asus if this is a known issue at their end or have the latest VBIOS available for Asus cards.
Meanwhile, I am also checking within and across teams for same GPU so that I can try to recreate issue locally.
Also, can you please share fresh bug report just immediately after triggering issue (Suggest you reboot system once before trying to trigger issue so that report has only relevant logs)
Yes, we have issues with Asus RTX 3060 LHR cards on multiple devices. Do you have a recommendation on how to reach out to Asus?
Also, can you please share fresh bug report just immediately after triggering issue (Suggest you reboot system once before trying to trigger issue so that report has only relevant logs)
If you take a closer look, you’ll see that the issue occurs immediatelly after booting the device, so we can’t get a better log than the one that we attached.
This is the relevant part of the log where you can see that GPU is not being registered at boot.
Jan 19 08:29:08 forsight-desktop kernel: [ 5.008866] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
Jan 19 08:29:08 forsight-desktop kernel: [ 5.058634] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 525.60.11 Wed Nov 23 23:04:03 UTC 2022
Jan 19 08:29:08 forsight-desktop kernel: [ 5.100330] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 525.60.11 Wed Nov 23 22:49:17 UTC 2022
Jan 19 08:29:08 forsight-desktop kernel: [ 5.105242] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Jan 19 08:29:08 forsight-desktop kernel: [ 5.376815] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0x72:1427)
Jan 19 08:29:08 forsight-desktop kernel: [ 5.377047] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 19 08:29:08 forsight-desktop kernel: [ 5.377092] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
Jan 19 08:29:08 forsight-desktop kernel: [ 5.377163] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
Jan 19 08:29:08 forsight-desktop kernel: [ 5.441738] nvidia-uvm: Loaded the UVM driver, major device number 507.
Jan 19 08:29:15 forsight-desktop kernel: [ 12.234685] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0x72:1427)
Jan 19 08:29:15 forsight-desktop kernel: [ 12.234707] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 19 08:29:15 forsight-desktop kernel: [ 12.302359] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0x72:1427)
Jan 19 08:29:15 forsight-desktop kernel: [ 12.302378] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 19 08:29:15 forsight-desktop kernel: [ 12.370612] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0x72:1427)
Jan 19 08:29:15 forsight-desktop kernel: [ 12.370629] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 19 08:29:15 forsight-desktop kernel: [ 12.437986] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0x72:1427)
Jan 19 08:29:15 forsight-desktop kernel: [ 12.438007] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 19 08:29:15 forsight-desktop kernel: [ 12.506141] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0x72:1427)
Jan 19 08:29:15 forsight-desktop kernel: [ 12.506162] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 19 08:29:15 forsight-desktop kernel: [ 12.576142] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0x72:1427)
The underlying issue is that no initial framebuffer is handed over from bios. If you connect a monitor to the nvidia gpu, are the POST messages displayed? does the driver work in that condition?
Should be fixable by a vbios update from asus. Contacting them might be in vain as they deny any support when running linux. Rather check their support site, e.g. https://www.asus.com/us/motherboards-components/graphics-cards/dual/dual-rtx3060-o12g-v2/helpdesk_bios/?model2Name=DUAL-RTX3060-O12G-V2
(Don’t know your specific model, you’ll have to check.)
If that doesn’t help, you might try some system bios tweaks, e.g. setting primary vga adapter explicitly, disabling fast boot, enabling csm.
Hi @generix, thank you for your reply.
Unfortunately, we don’t have physical access to the device since it’s deployed at our customer location.
We would like to avoid a site visit in order to fix this issue.
I checked the ASUS link, but it’s for Windows.
I tried using nvflash to install newer VBIOS, and the GPU did show up just for a moment, and then it again stopped working with the same error.
This is the exact device that we have installed:
NVIDIA Firmware Update Utility (Version 5.792.0)
Copyright (C) 1993-2022, NVIDIA Corporation. All rights reserved.
NVIDIA display adapters present in the system:
<0> Graphics Device (10DE,2504,1043,8810) S:00,B:01,D:00,F:00
Do you have any other ideas which we could try without physical access or a site visit?
@generix Thank you. I unpacked the Windows exe file and found the VBIOS files and tried flashing them.
The process succeeds without any issues, but the issue is still there. I tried rebooting, reinstalling different versions of drivers, but nothing did the trick.
I assume that we’ll have to perform the BIOS changes :S
What’s weird though is that this device has been working without any issues and after an update the GPU wasn’t visible anymore. I’m puzzled how this can have something to do with BIOS settings.
That’s an important piece of information that was missing before.
Taking this into account, might be either an issue with grub or even defective hardware or something else. I guess without attaching a monitor you won’t find out what’s really going on.
I’d rather say “not at all”, since everything that happens before the kernel loads is unknown. You can of course do some indirect checks, check version installed, check apt history whether that got updated when the issues started, check grub config for odd entries.
If you can reliably reproduce the issue (start with monitor connected - nvidia works, start with monitor disconnected - nvidia doesn’t work) this might be a valid workaround.
Hi @generix, we’re again observing this issue… we recalled a unit from our customer and sent a replacement unit to the same site. In our HQ, everything worked as supposed during testing and the GPU would show without any issues and our software would work as well.
Now, we have the same issue after installing the device at the customers location:
[ 889.927494] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[ 889.927509] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Could this issue be related to power issues with the outlet?
This is now a completely different issue, would point to either the gpu being installed in an unsupported vm or broken.
Both errors you were running into are extremely unlikely to get with a bare-metal install so unstable/unclean power might be a valid reason.