I’m trying to isolate a boot freeze on a hybrid Intel + NVIDIA laptop running Pop!_OS.
System
- Laptop: HP Pavilion cx 144 gaming
- OS: Pop!_OS 24.04
- Kernel:
6.17.9-76061709-generic - GPUs: Intel UHD 630 + NVIDIA GP107M / GTX 1050 Ti Mobile
- NVIDIA driver:
470.256.02(nvidia-dkms-470/nvidia-driver-470) - I am using reFIND → systemd-boot → Pop!_OS
At first I was using the newer nvidia-driver-580, but that was also failing. I asked an LLM and it suggested that since my GTX 1050 Ti Mobile is an older Pascal mobile GPU, the newer NVIDIA path was probably not the best fit on this setup, so it told me to downgrade to 470.
Problem
Pop!_OS boots reliably only when I use a safe Intel-only boot entry that blacklists the NVIDIA modules.
When I boot normally with the proprietary NVIDIA stack enabled, the machine freezes during boot. Like in the image below, the screen was frozen, so I had to force restart.
The visible splash/progress line is not consistent, so I do not think the last on-screen service name is the real cause.
Also, before installing Pop!_OS, I was on Windows 11, where the NVIDIA GPU was not showing up correctly. After updating the NVIDIA driver, the system crashed and became unstable, and I had to disable the driver from Windows Recovery. Rather than continuing to debug it on Windows 11, I installed Pop!_OS to check whether this is a software issue or a hardware problem, since I expected debugging drivers on Linux to be easier than on Windows.
I created a verbose boot entry with NVIDIA enabled. On that boot, the journal shows:
journalctl -b -1 -k --no-pager \
| grep -Ei 'NVRM: GPU at PCI:0000:01:00|NVRM: Xid \(PCI:0000:01:00\)|BUG: kernel NULL pointer dereference|#PF:|RIP: .*_nv|rm_init_adapter|nvkms_open_gpu|nv_drm_load|nv_drm_probe_devices|nv_linux_drm_init' \
| grep -Evi 'pcieport|aer|alcor|sdcard'
Apr 05 10:05:32 pop-os kernel: NVRM: Xid (PCI:0000:01:00): 62, pid=496, ...
Apr 05 10:05:41 pop-os kernel: #PF: error_code(0x0002) - not-present page
Apr 05 10:05:41 pop-os kernel: RIP: 0010:_nv035204rm+0xac/0x130 [nvidia]
Apr 05 10:05:41 pop-os kernel: ? rm_init_adapter+0xc5/0xe0 [nvidia]
Apr 05 10:05:41 pop-os kernel: ? nvkms_open_gpu+0x4e/0x90 [nvidia_modeset]
Apr 05 10:05:41 pop-os kernel: ? nv_drm_load+0x10d/0x480 [nvidia_drm]
Apr 05 10:05:41 pop-os kernel: ? nv_drm_probe_devices+0x1eb/0x2c0 [nvidia_drm]
Apr 05 10:05:41 pop-os kernel: ? nv_linux_drm_init+0xe/0xff0 [nvidia_drm]
I checked the journalctl errors and saw NVIDIA-related lines like rm_init_adapter and nvkms_open_gpu. Based on some online searching, I thought the freeze might be happening when nvidia_drm gets involved, so I tried a second boot entry that disables only that part during boot:
nvidia-drm.modeset=0module_blacklist=nvidia_drmmodprobe.blacklist=nvidia_drm
That time the machine did not freeze at the same early boot stage, but the desktop was still not healthy. Apps like Chrome and VS Code were not opening properly, and the only way I could get VS Code to run was by forcing software rendering with:
ELECTRON_OZONE_PLATFORM_HINT=x11 LIBGL_ALWAYS_SOFTWARE=1 code --disable-gpu --disable-software-rasterizer --ozone-platform=x11 .
So that seemed to disable GPU rendering enough to make VS Code open, but it did not mean the system was actually fixed. A later freeze from that debug boot still showed another NVIDIA-side kernel fault.
For that later NoDRM crash, this is the relevant kernel output:
Apr 05 13:02:21 pop-os kernel: #PF: error_code(0x0000) - not-present page
Apr 05 13:02:21 pop-os kernel: RIP: 0010:_nv010161rm+0x3c/0x340 [nvidia]
Apr 05 13:02:21 pop-os kernel: ? rm_get_gpu_uuid+0x28/0x150 [nvidia]
Apr 05 13:02:21 pop-os kernel: ? nv_procfs_read_gpu_info+0x14f/0x330 [nvidia]
Apr 05 13:02:21 pop-os kernel: RIP: 0010:_nv035204rm+0xac/0x130 [nvidia]
So even with nvidia_drm blocked, the base nvidia driver path still appears to hit a kernel page fault.
I also tested whether the dGPU powers on and enumerates on PCI:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] [10de:1c8c] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
I also tested whether the dGPU powers on, is reachable, and whether the PCIe link trains correctly:
system76-power graphics power on
lspci -nn | grep -iE 'vga|3d|nvidia'
lspci -vv -s 01:00.0 | grep -E 'LnkCap|LnkSta'
Output:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] [10de:1c8c] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
LnkCap: Port #0, Speed 8GT/s, Width x16
LnkSta: Speed 8GT/s, Width x16
This suggests that:
- The NVIDIA GPU still powers on and enumerates on PCI
- The PCIe link trains correctly at 8.0 GT/s x16
Question
At this point I feel like I’m running out of debugging steps I can think of. I also don’t have a strong mental model of how the NVIDIA driver stack (kernel module, DRM, firmware, etc.) actually initializes during boot, so I’m not sure where to go deeper.
From what I’ve tested so far, does this at least suggest the GPU hardware is fine, and this is more likely a driver/kernel issue? Any help regarding what I can try next to debug this further in a meaningful way?