I am writing to report a critical issue that I have been encountering with my NVIDIA RTX 2000 Ada Generation Laptop GPU, installed in a Lenovo ThinkPad P1 Gen 6. The problem leads to a “scheduling while atomic” kernel BUG followed by system freeze on my Linux system running kernel versions 6.8.1, 6.7.x, 6.6.x, and 6.1.x.
Description of the Problem
Randomly during system operation, the kernel gives the error message “scheduling while atomic”. This error occurs seemingly at random intervals and under varying system loads. Subsequently, at some point, the system becomes unresponsive and necessitates a hard reboot to regain functionality.
Steps to Reproduce
Operate the system under normal conditions.
Encounter a random kernel BUG with “scheduling while atomic”.
System becomes frozen and unresponsive at some random time
System Information
GPU: NVIDIA Corporation AD107GLM [RTX 2000 Ada Generation Laptop GPU] (rev a1) on Lenovo Thinkpad P1 Gen 6
Linux Distribution: Gentoo Linux - Kernel Version: 6.8.1 (Also affects kernel versions 6.7.x, 6.6.x, and 6.1.x)
Nvidia driver: 535.161.07 (but issue with any versions!)
Additional Information
The issue persists across multiple kernel versions, indicating it is not specific to a particular kernel release.
I have examined the system logs and have identified the occurrence of the “scheduling while atomic” error as the primary issue leading to the kernel panic and subsequent system freeze.
No specific system activity or workload triggers the error; it happens seemingly at random.
I have ensured that the GPU drivers are up to date and have attempted to reinstall them without resolving the issue, but no matter which NVIDIA driver version I install, the bug consistently persists.
This issue severely affects the usability and stability of my system, and I kindly request your assistance in resolving it promptly. If there are any additional diagnostic steps or information required from my end, please let me know, and I will gladly provide it.
@generix Thanks for the suggestion. I’ve upgraded to the latest 550 driver and set the kernel parameter/module option nvidia-drm.modeset=1 as you recommended. So far, the “scheduling while atomic” does not occur. However, I got the following error in the meantime:
[Fri Mar 22 00:05:57 2024] [drm:nv_drm_semsurf_fence_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to create sync file from fence on ctx 0x00000001
I’ll let you know once the “scheduling while atomic” bug occurs again, as it happens at random intervals.
Please create a new nvidia-bug-report.log in the current state, the 535 driver had some broken thermal settings on your gpu, I want check if at least that has changed.
@generix Thank you for your reply. I see. Well, I don’t utilize hybrid graphics. I directly opt for Nvidia and have disabled the Intel card. Yes, I am using a non-compositing window manager (dwm). Also, I have ‘Y’ set in /sys/module/nvidia_drm/parameters/modeset . Here is my Xorg configuration:
Hello
I have exactly same laptop running Kubuntu 23.10 (with KDE plasma)
I had no problems running Nvidia drivers up until 550.x, where I experienced system freeze on startup after driver install, which was fixed by switching to hybrid mode in BIOS and to Prime mode using nvidia-settings app
Since then I have no problems
Also, composer in KDE switched off
As @generix suggested, I have modest set to 1:
options nvidia-drm modeset=1