Hi @wassou93
It is still under investigation.
I wanted to check if you know the last passing driver.
Still happening on 555.52.04-1. I am also using CachyOS with the 6.9.3-3-cachyos-lto kernel. Discord takes forever to load and spams dmesg with this message. Unsure if itās related or not, but occasionally while playing video with discord on the other screen, the screen freezes but audio still plays. The only way to get out of it is locking the desktop with a keyboard shortcut and waiting. While trying to debug that I found these dmesg logs.
I also noticed this happening with Discord. Even when following @orxcyd 's solution of adding i915 to the initramfs configuration, the same issue still persist on my system. I have the following specs:
Laptop: Dell G3 3500
CPU: Intel(R) Coreā¢ i7-10750H
GPU: NVIDIA GeForce GTX 1650 Ti Mobile
OS: Arch Linux
Kernel Version: 6.9.3
NVIDIA Driver: 550.78
Window Manager: Hyprland
I have only recently switched to Wayland so I do not know when the last non-issue version is.
Hi all, I first reported this issue on September 18, 2023 (with Kernel 6.5.1) to linux-bugs@nvidia.com
I was told at the time that this was an incorrect error message and that it would be downgraded to a warning, and the ticket was closed. Iām not sure which driver version I was running back then, however.
I do find that if you do not early load the nvidia modules, then it goes away. Iām seeing the error line get spammed 100s of times a second, so its really not practical to have them loaded for me. Unfortunately I require a few electron based apps for work, and run a chromium based browser as well.
Given the timespans, Iām not holding out a whole lot of hope.
Hi, an update on this. so I forgot to regenerate initramfs (which is done with mkinitcpio on Arch Linux). After doing so the problem goes away, so it really seems like something to do with early loading.
I think itās an early loading issue I found a fix, I removed kms from mkinitcpio HOOK
and added nvidia and i915 modules to MODULES
then re-ran mkinitcpio -P and rebooted and made sure drm is enabled in grub and everything worked
in my /etc/mkinitcpio.conf:
MODULES=(i915 nvidia nvidia_modeset nvidia_uvm nvidia_drm)
HOOKS=(base udev autodetect microcode modconf block keyboard keymap consolefont plymouth filesystems fsck)
then ran sudo mkinitcpio -P
these are the packages I have isntalled for my cachyos
āÆ paru -Qs nvidia
local/egl-wayland 2:1.1.13-3
EGLStream-based Wayland external platform
local/lib32-libvdpau 1.5-2
Nvidia VDPAU library
local/lib32-nvidia-utils 555.52.04-1
NVIDIA drivers utilities (32-bit)
local/lib32-opencl-nvidia 555.52.04-1
OpenCL implemention for NVIDIA (32-bit)
local/libva-nvidia-driver 0.0.12-1.1
VA-API implementation that uses NVDEC as a backend
local/libvdpau 1.5-2.1
Nvidia VDPAU library
local/libxnvctrl 555.42.02-2
NVIDIA NV-CONTROL X extension
local/linux-cachyos-nvidia 6.9.6-2
nvidia module of 555.52.04 driver for the linux-cachyos kernel
local/nvidia-prime 1.0-4
NVIDIA Prime Render Offload configuration and utilities
local/nvidia-settings 555.42.02-2
Tool for configuring the NVIDIA graphics driver
local/nvidia-utils 555.52.04-3
NVIDIA drivers utilities
local/opencl-nvidia 555.52.04-3
OpenCL implemention for NVIDIA
I hope this helps.
adding amdgpu(amdgpu is for amd igpu and i915 for intel for who donāt know)
just uses the iGPU for rendering rather than NVIDIA or CPU(software rendering)
so it doesnāt really fix the problem
before adding amdgpu to MODULES
brave wonāt load 3D websites (eg. bruno-simon.com)
but firefox would (using iGPU)
adding amdgpu enabled WebGL using iGPU as hardware accelerator
now brave can run 3D websites but wonāt run on NVIDIA
and btw i was using nvidia 535 drivers
iām using 535 drivers as the newer drivers are causing crash while upgrading packages
in ArchLinux
At first the browser would take long time to launch
but now the browser launches quickly but doesnāt use nvidia as the gpu
tested that using nvtop
nvtop
shows both gpus
NVIDIA GeForce RTX 3050 Laptop GPU
and
AMD Radeon Graphics
prime-run
used to work previously (donāt know when)
Hello! Exactly the same problems with driver 560.35.03. It looks like the problem is in the kernel configuration.
For example, a stock image works great cfg_default.txt (270.1 KB)
Also works well on the new kernel version cfg_custom_desktop.txt (253.6 KB)
but there is a problem with the server implementation cfg_custom_server_100hz.txt (240.2 KB)
On hybrid systems (nvidia+intel), setting the BIOS parameters such as Aperture Size and DVMT Pre-allocated can help.
I am getting the error kernel: [drm:__nv_drm_gem_nvkms_memory_prime_get_sg_table [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Cannot create sg_tabl e for NvKmsKapiMemory 0x000000006527f86e
on an AMD laptop with switchable graphics when running gamescope which causes it to crash. This is on driver version 560.35.03 and it happens with both the proprietary and open source kernel modules.
I encountered this issue after upgrading to KDE Neon 24.04.1 and switching to Wayland. Google Chrome wouldnāt display any windows with --ozone-platform=wayland
, and the kernel log would be filled with messages like [ 6767.216373] [drm:__nv_drm_gem_nvkms_memory_prime_get_sg_table [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Cannot create sg_table for NvKmsKapiMemory 0x00000000aa338886 [
. I fixed this by adding i915
to /etc/initramfs-tools/modules
, running update-initramfs -c -k all
, and rebooting. Thanks to everyone who discovered this workaround!
Iām having the same issue with apps that try to use NVIDIA card ā all electron based + some more like Darktable or wine.
Operating System: EndeavourOS
KDE Plasma Version: 6.2.0
Kernel Version: 6.11.3-zen1-1-zen (64-bit)
Graphics Platform: Wayland
Processors: 16 Ć AMD Ryzen 7 5800H with Radeon Graphics
Graphics Processor: AMD Radeon Graphics
Manufacturer: ASUSTeK COMPUTER INC.
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
Errors in dmesg are [drm:__nv_drm_gem_nvkms_memory_prime_get_sg_table [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Cannot create sg_table for NvKmsKapiMemory
Iām using NVIDIA drivers 560.35.03-6.
Since Iām using EndeavourOS with systemd and dracut, I had to create file /etc/dracut.conf.d/myflags.conf
, put in:
force_drivers+=" amdgpu nvidia nvidia_modeset nvidia_uvm nvidia_drm "
and then run sudo reinstall-kernels
.
Now Darktable finds NVIDIA card, Upscayl runs OK, and electron apps start without delay. Thanks for the tip guys, I was getting quite annoyed by that bug.
EDIT: Looks like I spoke too soon. I still have crash in dmesg
:
[ 1023.198191] NVRM: GPU at PCI:0000:01:00: GPU-7312c96f-4f21-eb4c-15a0-dab57f44a76d
[ 1023.198198] NVRM: Xid (PCI:0000:01:00): 119, pid=76164, name=Typora, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x2080205b 0x4).
[ 1023.198205] NVRM: GPU0 GSP RPC buffer contains function 76 (GSP_RM_CONTROL) and data 0x000000002080205b 0x0000000000000004.
[ 1023.198210] NVRM: GPU0 RPC history (CPU -> GSP):
[ 1023.198212] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling
[ 1023.198215] NVRM: 0 76 GSP_RM_CONTROL 0x000000002080205b 0x0000000000000004 0x000624459ebf6219 0x0000000000000000 y
[ 1023.198222] NVRM: -1 47 UNLOADING_GUEST_DRIVE 0x0000000000000000 0x0000000000000000 0x000624459cb59a97 0x000624459cb8b269 202706us
[ 1023.198229] NVRM: -2 10 FREE 0x00000000c1e00055 0x0000000000000000 0x000624459cb597eb 0x000624459cb59a3f 596us
[ 1023.198234] NVRM: -3 10 FREE 0x000000000000000a 0x0000000000000000 0x000624459cb5948e 0x000624459cb597e8 858us
[ 1023.198239] NVRM: -4 10 FREE 0x000000000000000b 0x0000000000000000 0x000624459cb59128 0x000624459cb5928a 354us
[ 1023.198244] NVRM: -5 10 FREE 0x0000000000000006 0x0000000000000000 0x000624459cb58cc1 0x000624459cb59115 1108us
[ 1023.198248] NVRM: -6 10 FREE 0x0000000000000002 0x0000000000000000 0x000624459cb57f41 0x000624459cb58be2 3233us
[ 1023.198253] NVRM: -7 10 FREE 0x0000000000000005 0x0000000000000000 0x000624459cb57801 0x000624459cb57f35 1844us
[ 1023.198257] NVRM: GPU0 RPC event history (CPU <- GSP):
[ 1023.198260] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc
[ 1023.198263] NVRM: 0 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000624459cb65537 0x000624459cb65539 2us
[ 1023.198268] NVRM: -1 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000028 0x000624459cb5f562 0x000624459cb5f566 4us
[ 1023.198274] NVRM: -2 4111 PERF_BRIDGELESS_INFO_ 0x0000000000000000 0x0000000000000000 0x000624459cb5f3a0 0x000624459cb5f3a1 1us
[ 1023.198279] NVRM: -3 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000624459b6f9728 0x000624459b6f9728
[ 1023.198283] NVRM: -4 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000624459b6f95a7 0x000624459b6f95a9 2us
[ 1023.198288] NVRM: -5 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000027 0x000624459b6f7cb6 0x000624459b6f7cba 4us
[ 1023.198293] NVRM: -6 4098 GSP_RUN_CPU_SEQUENCER 0x000000000000061c 0x0000000000003fe2 0x000624459b6eadf5 0x000624459b6ec004 4623us
[ 1023.198298] NVRM: -7 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x00062445886ae1ef 0x00062445886ae1f0 1us
[ 1023.198304] CPU: 1 UID: 1000 PID: 76164 Comm: Typora Tainted: P OE 6.11.3-zen1-1-zen #1 1400000003000000474e5500d4154c511b9cdca1
[ 1023.198312] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 1023.198314] Hardware name: ASUSTeK COMPUTER INC. ProArt StudioBook H5600QM_H5600QM/H5600QM, BIOS H5600QM.321 05/10/2023
[ 1023.198316] Call Trace:
[ 1023.198320] <TASK>
[ 1023.198323] dump_stack_lvl+0x5d/0x80
[ 1023.198334] _nv012948rm+0x4ee/0x590 [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.198970] _nv012865rm+0x77/0x330 [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.199440] _nv048628rm+0x49f/0x7f0 [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.199885] _nv051992rm+0xa4/0x150 [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.200446] _nv047909rm+0x1a1/0x1b0 [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.200880] _nv049933rm+0x3ff/0x500 [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.201312] _nv014741rm+0x42e/0x690 [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.201649] _nv048046rm+0x29/0x30 [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.201978] ? _nv049936rm+0x60/0x60 [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.202312] _nv000762rm+0x58/0x70 [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.202677] _nv000761rm+0x21b/0x220 [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.203024] _nv000713rm+0x1a3/0x300 [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.203384] rm_transition_dynamic_power+0xd7/0x13f [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.203732] nv_pmops_runtime_resume+0xb9/0xf0 [nvidia 1400000003000000474e550096469435de778d30]
[ 1023.204010] ? __pfx_pci_pm_runtime_resume+0x10/0x10
[ 1023.204014] __rpm_callback+0x44/0x170
[ 1023.204019] ? __pfx_pci_pm_runtime_resume+0x10/0x10
[ 1023.204023] rpm_resume+0x5bb/0x850
[ 1023.204029] pm_runtime_barrier+0x86/0x90
[ 1023.204033] pci_config_pm_runtime_get+0x3a/0x60
[ 1023.204038] pci_read_config+0x99/0x2f0
[ 1023.204045] kernfs_fop_read_iter+0xab/0x1b0
[ 1023.204051] vfs_read+0x347/0x470
[ 1023.204058] __x64_sys_pread64+0x98/0xd0
[ 1023.204063] do_syscall_64+0x82/0x190
[ 1023.204069] ? srso_alias_return_thunk+0x5/0xfbef5
[ 1023.204073] ? syscall_exit_to_user_mode+0x10/0x1e0
[ 1023.204077] ? srso_alias_return_thunk+0x5/0xfbef5
[ 1023.204080] ? do_syscall_64+0x8e/0x190
[ 1023.204083] ? srso_alias_return_thunk+0x5/0xfbef5
[ 1023.204087] ? do_syscall_64+0x8e/0x190
[ 1023.204090] ? srso_alias_return_thunk+0x5/0xfbef5
[ 1023.204093] ? do_syscall_64+0x8e/0x190
[ 1023.204097] ? exc_page_fault+0x81/0x190
[ 1023.204101] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 1023.204105] RIP: 0033:0x70a99ec051f7
[ 1023.204126] Code: 00 00 00 0f 05 f7 d8 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 75 0e 10 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 55 48 89 e5 48 83 ec 20 48 89 55 e8 48
[ 1023.204129] RSP: 002b:00007ffd23d5b9c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000011
[ 1023.204134] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 000070a99ec051f7
[ 1023.204136] RDX: 0000000000000001 RSI: 00007ffd23d5ba07 RDI: 000000000000000f
[ 1023.204138] RBP: 00007ffd23d5b9f0 R08: 0000000000000073 R09: 0000000000000000
[ 1023.204141] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000001
[ 1023.204143] R13: 0000313c00080fc0 R14: 00007ffd23d5ba07 R15: 0000313c00020000
[ 1023.204150] </TASK>
[ 1029.205013] NVRM: Xid (PCI:0000:01:00): 119, pid=76164, name=Typora, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x20800a81 0x4).
[ 1035.205851] NVRM: Xid (PCI:0000:01:00): 119, pid=72494, name=kworker/2:0, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x20802092 0x4).
[ 1041.206690] NVRM: Rate limiting GSP RPC error prints for GPU at PCI:0000:01:00 (printing 1 of every 30). The GPU likely needs to be reset.
Hi All,
So far, it does not look like the hang happens in our drivers. There is an chromium bug filed for the same and can be tracked further as below. Chromium
Thank you for trying to fix this, but for me itās not related only to Electron applications. And weird part is that it works for a short while after restart and then starts having problems. For example when I ran darktable-cltest
5 seconds after booting into KDE, I get
0.0220 [opencl_init] opencl disabled via darktable preferences
0.0221 [opencl_init] opencl library 'libOpenCL' found on your system and loaded, preference 'default path'
0.8577 [opencl_init] found 1 platform
[opencl_init] found 1 device
[dt_opencl_device_init]
DEVICE: 0: 'NVIDIA GeForce RTX 3060 Laptop GPU'
CONF KEY: cldevice_v5_nvidiacudanvidiageforcertx3060laptopgpu
PLATFORM, VENDOR & ID: NVIDIA CUDA, NVIDIA Corporation, ID=4318
CANONICAL NAME: nvidiacudanvidiageforcertx3060laptopgpu
DRIVER VERSION: 560.35.03
DEVICE VERSION: OpenCL 3.0 CUDA, SM_20 SUPPORT
DEVICE_TYPE: GPU, dedicated mem
GLOBAL MEM SIZE: 5834 MB
MAX MEM ALLOC: 1459 MB
MAX IMAGE SIZE: 32768 x 32768
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 64 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
USE HEADROOM: 400Mb
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH & HEIGHT 16x16
CHECK EVENT HANDLES: 128
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /usr/share/darktable/kernels
KERNEL DIRECTORY: /home/elman/.cache/darktable/cached_v3_kernels_for_NVIDIACUDANVIDIAGeForceRTX3060LaptopGPU_5603503
CL COMPILER OPTION: -cl-fast-relaxed-math
CL COMPILER COMMAND: -w -cl-fast-relaxed-math -DNVIDIA_SM_20=1 -DNVIDIA=1 -I"/usr/share/darktable/kernels"
KERNEL LOADING TIME: 0.0753 sec
[opencl_init] OpenCL successfully initialized. internal numbers and names of available devices:
[opencl_init] 0 'NVIDIA CUDA NVIDIA GeForce RTX 3060 Laptop GPU'
1.0607 [opencl_init] FINALLY: opencl PREFERENCE=OFF is AVAILABLE and NOT ENABLED.
But when I tried 20 seconds later, I got error after 108 seconds:
0.0202 [opencl_init] opencl disabled via darktable preferences
0.0203 [opencl_init] opencl library 'libOpenCL' found on your system and loaded, preference 'default path'
108.2123 [opencl_init] 0 platforms detected, error: Unknown OpenCL error
108.2123 [opencl_init] FINALLY: opencl PREFERENCE=OFF is NOT AVAILABLE and NOT ENABLED.
Another issue I found is with System Monitor, where I get error āThis page is missing some sensors and will not display correctly.ā when trying to view Nvidia memory usage, GPU usage and GPU Frequency.
Unfortunately at this point I have some many issues that I was forced to switch from hybrid mode to integrated so that I can at least work.
Hi @amrits , Sorry didnāt notice your question, I know this might be late but the last passing driver was 535.171.04 anything beyond that chrome canāt use under wayland and it will hit the DRM error in journalctl logs.
Hi. I just found out that if I start my laptop with external screen connected via HDMI, I donāt have this issue. Everything is working as expected, CL is detected in Darktable and I have no delay when starting Electron apps. After boot I can disconnect my display and things keep working. Curiousā¦
I was experiencing this issue on Ubuntu 22.04 with KDE/Wayland but thanks to the comments here I was able to work out something that fixed it for me:
Edit the file /etc/initramfs-tools/modules
and add the following lines at the end:
i915
nvidia
nvidia_modeset
nvidia_uvm
nvidia_drm
Now run sudo update-initramfs -c -k all
, and then reboot.
Note, one thing I noticed: a bit later I updated my linux kernel command line with update-grub
, which broke it for me again (even when I changed the command line back to what it was before). But then, when I ran update-initramfs
again, it was fixed again. Iām not 100% sure why but I guess update-grub
overwrites something written by update-initramfs
. In any case, I guess make sure update-initramfs
is the last command that updates something related to booting.
edit: I should also mention that I havenvidia_drm.modeset=1
on the kernel command line (but Iām not sure if that has any effect on this bug)