hi all,
I recently installed Linux Mint 21 MATE on a new Razer Blade 15 (2022) laptop.
I installed the package linux-oem-22.04
in order to easily upgrade from kernel 5.15 to 5.17 and solve my wireless issue.
I then installed the latest proprietary NVIDIA driver on it, nvidia-driver-525
, precisely version 526.60.11.
Right after installing the NVIDIA driver, I had to wrestle with this nasty “out of memory” error at boot time:
and was able to solve it by following comment 25.
I was then able to boot successfully a fresh 5.17 kernel with NVIDIA driver.
The problem is that a few minutes after booting up the laptop, the GPU dies.
This happens both when the PRIME profile is on-demand
, the default, which thankfully means that I can still use the desktop although any nvidia
-related command errors out, but also when the PRIME profile is set to nvidia
, which means that the system freezes completely and needs a hard reboot.
When trying to connect to an external monitor, the problems happens as soon as I plug in the HDMI cable.
The system logs report the nasty “GPU has fallen off the bus” error, which is often described to be related to power supply issues or thermals.
Power supply should not be the problem since this is an embedded laptop from a reputable brand, not a self-assembled hack job of a desktop with a poor PSU.
Thermals are not to blame either, as this consistently happens a few minutes (say, five) after booting, without any usage whatsoever (temperature around 40 C), definitely not after a heavy computational or gaming session.
I read that one could try and set the persistence mode on the GPU to avoid an automatic switch-off by typing:
sudo nvidia-smi -pm 1
and that such command is deprecated and that one should instead enable the systemctl service named nvidia-persistenced
.
In my case, the service was already enabled and running even as I was having these issues.
I noticed that the service itself was running with parameter --no-persistence-mode
, so I figured that might be the problem and modified the service file to run with --persistence-mode
, instead.
That had no effect on the error, and the GPU still “falls off the bus” after a few minutes.
Finally, since I am running with PRIME profile on-demand
, I can see that X is successfully loaded on the GPU by running nvidia-smi
right as I get the desktop, but before the GPU dies out.
In other words, it’s not like the GPU gets switched off because nothing is using it, say, after having completed some CUDA computations – X is using it!
I also tried, without success:
- installing
linux-oem-22.04b
to load kernel 6.0.0; - boot option
nvidia-drm.modeset=0
; - boot option
pcie_aspm=off
; - boot options
pci=check_enable_amd_mmconf
andidle=nomwait
; - adjusting the clocks with
nvidia-smi -lgc 300,1750
.
I have read all the “GPU has fallen off the bus” threads I could find, but no solution.
Any help is appreciated, and I am happy to share any logs to you knowledgeable gurus. Cheers!
nvidia logs at boot, before crash:
$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3070 Ti Laptop GPU (UUID: GPU-d7e3314f-0671-9225-6b48-39bfc97fc3c7)
$ nvidia-smi
Thu Dec 8 10:13:20 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| N/A 43C P8 10W / N/A | 5MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1839 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
after crash:
$ nvidia-smi
Unable to determine the device handle for GPU0000:01:00.0: Unknown Error
further info, after crash:
System:
Kernel: 5.17.0-1021-oem x86_64 bits: 64 compiler: gcc v: 11.3.0
Desktop: MATE 1.26.0 info: mate-panel wm: marco 1.26.0 vt: 7
dm: LightDM 1.30.0 Distro: Linux Mint 21 Vanessa base: Ubuntu 22.04 jammy
Machine:
Type: Laptop System: Razer product: Blade 15 (2022) - RZ09-0421 v: 8.04
serial: <superuser required> Chassis: type: 10 serial: <superuser required>
Mobo: Razer model: CH580 v: 4 serial: <superuser required> UEFI: Razer
v: 1.08 date: 02/16/2022
CPU:
Info: 14-core (6-mt/8-st) model: 12th Gen Intel Core i7-12800H bits: 64
type: MST AMCP smt: enabled arch: Alder Lake rev: 3 cache: L1: 1.2 MiB
L2: 11.5 MiB L3: 24 MiB
Speed (MHz): avg: 534 high: 699 min/max: 400/4800:3700 cores: 1: 510
2: 441 3: 499 4: 548 5: 552 6: 681 7: 490 8: 467 9: 469 10: 447 11: 435
12: 445 13: 615 14: 633 15: 608 16: 530 17: 552 18: 496 19: 580 20: 699
bogomips: 112127
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
Device-1: Intel Alder Lake-P Integrated Graphics vendor: Razer USA
driver: i915 v: kernel ports: active: eDP-1 empty: none bus-ID: 00:02.0
chip-ID: 8086:46a6 class-ID: 0300
Device-2: NVIDIA GA104 [Geforce RTX 3070 Ti Laptop GPU] driver: nvidia
v: 525.60.11 pcie: speed: Unknown lanes: 63 ports: active: none
empty: DP-1, DP-2, DP-3, HDMI-A-1 bus-ID: 01:00.0 chip-ID: 10de:24a0
class-ID: 0300
Device-3: IMC Networks Integrated RGB Camera type: USB driver: uvcvideo
bus-ID: 1-2:2 chip-ID: 13d3:5279 class-ID: 0e02 serial: <filter>
Display: x11 server: X.Org v: 1.21.1.3 compositor: marco v: 1.26.0
driver: X: loaded: modesetting,nvidia unloaded: fbdev,nouveau,vesa
gpu: i915 display-ID: :0.0 screens: 1
Screen-1: 0 s-res: 1920x1080 s-dpi: 98 s-size: 499x280mm (19.6x11.0")
s-diag: 572mm (22.5")
Monitor-1: eDP-1 model: TL156VDXP02-0 res: 1920x1080 hz: 60 dpi: 142
size: 344x194mm (13.5x7.6") diag: 395mm (15.5") modes: 1920x1080
OpenGL: renderer: Mesa Intel Graphics (ADL GT2) v: 4.6 Mesa 22.0.5
direct render: Yes
nvidia debug log @ transfer.sh/JphnSL/nvidia-bug-report.log.gz
(internal upload feature was yielding an error)