575 release feedback & discussion

Debian12, package

 nvidia-alternative: allows the selection of NVIDIA as GLX provider

nvidia-alternative package has not been updated in a while;
attempting to install the latest 560.35 version of nvidia-alternative will trigger almost complete removal of nvidia-driver/nvidia-open-575.57.08 via required install of nvidia-installer-cleanup dependency.

consequently, following directories are missing required files, symlinks and/or sub-directories:

/etc/alternatives/
/etc/alternatives/glx
/etc/nvidia/
/usr/lib/nvidia/
/usr/lib/nvidia/current

as a result, gpus can’t be fully utilized via nvidia driver by a number of applications (games only) that depend on MESA/openGL, and all of them fallback onto llvmpipe driver.

system in current state:

update-glx --config nvidia
update-alternatives: error: no alternatives for nvidia

more info about related installed packages:

update-glx (installed)
glx-diversions (installed)
glx-alternative-nvidia (installed)
libglx-nvidia0 (installed)
libnvidia-glcore (installed)
libnvidia-cfg1 (installed)
ibnvidia-ngx1 (installed)
libnvidia-ngx1 (installed)
libnvidia-allocator1 (installed)

from what eye can see in the /etc/alternatives, following symlinks exist:

glx -> /usr/lib/nvidia  (directory)
glx--libEGL.so.1-i386-linux-gnu -> /usr/lib/mesa-diverted/i386-linux-gnu/libEGL.so.1
glx--libEGL.so.1-x86_64-linux-gnu -> /usr/lib/mesa-diverted/x86_64-linux-gnu/libEGL.so.1

and so on.

what is missing in /etc/alternatives are the following (presumably created via nvidia-alternative package?):

nvidia -> /usr/lib/nvidia/current  (directory)
nvidia--libcudadebugger.so.1-x86_64-linux-gnu
nvidia--libcudadebugger.so.1-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libcudadebugger.so.1
nvidia--libcuda.so.1-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.1
nvidia--libcuda.so-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so
nvidia--libEGL_nvidia.so.0-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.0
nvidia--libGLX_nvidia.so.0-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.0

and so on.

Thanks for the update, Please let me know if this issue reoccurs.

This bug started happening to me much more frequently on 575 than on 570. KDE folks seem to have confirmed that this bug is not on the end of Plasma but on the driver side of things.

I have to resort to shortcuts for manually restarting plasmashell now, happens just too often. This is ridiculous.

Hey! I’m using nvidia 575.57.08(-3 on arch) and 6.15.2 (vanilla) kernel.

I’ve been getting these

INFO: task nv_queue:678 blocked for more than 491 seconds.
       Tainted: P           OE       6.15.2-arch1-1 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

error on journalctl and when that happened, running nvidia-smi will just hang (and can’t be ctrl-c) and you can’t seem to open apps that relies on that nvidia gpu even vkcube, btop, etc.
And AFAIK there’s no way to make the gpu work again, other than restarting. Even when restarting, the nvidia-powerd will take a long time to get killed

Note: I’ve tried downgrading and it seems that version 550x is fine.

Looks like this issue from 570 causing black screen after powering off and on the display/monitor still persists.

I’m on EndevaourOS, Linux 6.15.2-arch1-1, driver 575.57.08, and I experienced the exact same issue. I can’t downgrade my driver too much since they are not compatible with this kernel. A simple workaround as the post mentioned is by switching between TTYs. I have noticed this for a while too, roughly from when 570 was rolled out.

Correct, it has an issue, so you will be greeted with a black screen

There is no nvidia-smi package for 575 in the Nvidia repo. The latest one is nvidia-smi_560.35.05-1_amd64.deb. What’s up with that?

@DanielCeregatti, nvidia-smi program has been moved to another package:

I’m using the same repo. I have 2 machines, one using the debs from that repo and another using the .run. I have nvidia-smi in the .run install, but not the one using the packages. If nvidia-smi has moved to another package, it’d sure be nice to know which. apt-file search nvidia-smi show no package other than the nvidia-smi package as containing it. At a loss.

# apt-file search nvidia-smi
...
nvidia-smi: /usr/lib/nvidia/current/nvidia-smi

# apt-cache policy nvidia-smi (showing only the latest)
560.35.05-1

# dpkg -l nvidia-driver
ii nvidia-driver 575.57.08-1 amd64 NVIDIA metapackage

Obviously I can’t install the nvidia-smi from 560 while having 575.

@DanielCeregatti, have you actually looked into the subject I linked? Truly all your questions are addressed there: you are just 1 click away from all the answers you desire ;-)

1 Like

Just finding it odd that apt-file search nvidia-smi doesn’t list the nvidia-driver-cuda package as one that contains a file named nvidia-smi, yet, it surely does. Guessing the repo lacks the metadata used by apt-file.

Thanks

I have stuttering that happens on my RTX3070 mobile, that only occurs when running on the internal laptop display (165 Hz), while two external displays (one connected via DP, the other HDMI) are disabled (but, powered-on/standby).

About every 10-15 seconds or so, there is a log entry about the monitor being connected over and over and over…and at this instant I get a noticeable lag/FPS drop in anything I’m doing.

If I turn the power off on the external (disabled) monitors, the problem stops. If I unplug the DP/HDMI cables, the problem stops.

This has been going on for at least the last 3 major releases, and it’s very very very annoying.

I am using the open firmware, and X11/XFCE, on Arch Linux x86_64.

Here’s a snippet from the Xorg log:

[ 1349.066] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): connected
[ 1349.066] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): Internal TMDS
[ 1349.066] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): 600.0 MHz maximum pixel clock
[ 1349.066] (–) NVIDIA(GPU-0):
[ 1362.946] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): connected
[ 1362.946] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): Internal TMDS
[ 1362.946] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): 600.0 MHz maximum pixel clock
[ 1362.947] (–) NVIDIA(GPU-0):
[ 1362.994] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): connected
[ 1362.994] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): Internal TMDS
[ 1362.994] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): 600.0 MHz maximum pixel clock
[ 1362.994] (–) NVIDIA(GPU-0):
[ 1376.883] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): connected
[ 1376.883] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): Internal TMDS
[ 1376.883] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): 600.0 MHz maximum pixel clock
[ 1376.883] (–) NVIDIA(GPU-0):
[ 1376.929] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): connected
[ 1376.929] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): Internal TMDS
[ 1376.929] (–) NVIDIA(GPU-0): HP Inc. Pavilion 27q (DFP-4): 600.0 MHz maximum pixel clock

(ad nauseum)

This is with the laptop (Lenovo Legion 5 Pro w/5800h and RTX3080 Max Q) powered up in dGPU (only) mode (iGPU completely disabled before boot).

This usually means some client program is polling the display outputs over and over. Please try disabling any performance monitor or overlay-type apps that might be doing repeated queries.

Don’t have any of that.

I have posted several times regarding stuttering and what I’ve found that triggers it. The stuttering is always accompanied by those log entries happening over and over:

Yes, that’s 3 posts in a row to the same topic.

Perhaps I shouldn’t have selected Rick Astley’s video for the example in the last post, but it’s a legit demonstration of the issue.

@aplattner I also have no performance or overlay apps running. The one constant has been JetBrains toolbox and Intellij IDE (the former is a launcher for the latter) triggering the issue. I’ve had to stop running them to avoid this issue.

1 Like

Yes, I saw your other posts and that clued me in as to the behavior, as by themselves the log entries appear pretty benign (until you notice there are lots of them).

I agree, it started after 565. Also, I always run with my compositor disabled all the time. I haven’t tried with it enabled.

I do have some JetBrains software as well (but not the suite you are talking about, but rather just the CLion IDE). However, it’s not running when I see the problem, it happens in either case.

I have noticed if I scroll through the editor tabs of the CLion tool, open Vulkan applications momentarily stall/nosedive in FPS as well (even the simplest of apps that only clear the surface and wait for the next vsync to present)…perhaps the IDE is using some sort of prioritized rendering queue, IDK. I don’t know if that’s a related issue or not.

$20 for whoever can find where the missing VRAM is going.

Hi all,

I’m trying to get my new 5060Ti GPU running. As a basic setup, I’m installing a fresh Linux Mint 22.1 system and using Kernel 6.11 with the onboard Intel GPU enabled.

First, I tested the nvidia-driver-570-open driver from the distribution, which is intended for limited use with a 5060Ti. The driver installed successfully, but after rebooting, I switched the BIOS setting to prioritize the 5060Ti GPU via PCI Express. The NV GPU showed POST initialization without errors, but during the boot process, I only saw a black screen while the GPU fan ran at full speed.

I then removed and purged the distribution driver and tried the NVIDIA-Linux-x86_64-575.57.08.run driver. However, the system froze during the final stages of compilation and installation. After a hard reset, I re-enabled the NV GPU in the BIOS, but again, the boot process resulted in a black screen with the GPU fan running at full speed.

What could be causing this issue?
nvidia-installer-575.57.08.log (64.2 KB)

I’m on Ubuntu 24.04. The included drivers also failed for me with the similar black screen during boot. I rebooted in failsafe mode and installed the driver manually with this command

sh NVIDIA-Linux-x86_64-575.57.08.run --glvnd-egl-config-path=/etc/glvnd/egl_vendor.d --kernel-module-type=open --compat32-libdir=/usr/lib32/

I click yes to all prompts and rebooted.

Fedora 42, laptop not recovering from suspend:

Jun 14 12:44:10 fedora systemd[1]: Reached target sleep.target - Sleep.
Jun 14 12:44:10 fedora systemd[1]: Starting nvidia-suspend.service - NVIDIA system suspend actions...
Jun 14 12:44:10 fedora suspend[22439]: nvidia-suspend.service
Jun 14 12:44:10 fedora logger[22439]: <13>Jun 14 12:44:10 suspend: nvidia-suspend.service
Jun 14 12:44:10 fedora kernel: rfkill: input handler enabled
Jun 14 12:44:10 fedora kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:0:0:0x00000011
Jun 14 12:44:10 fedora kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:2:0:0x00000011
Jun 14 12:44:10 fedora kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:4:0:0x00000011
Jun 14 12:44:10 fedora kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:6:0:0x00000011
Jun 14 12:44:10 fedora kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67d:0:0:0x00000011
Jun 14 12:44:10 fedora kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67d:0:0:0x00000011
Jun 14 12:44:10 fedora kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67d:0:0:0x00000011
Jun 14 12:44:14 fedora kernel: nvidia-modeset: ERROR: GPU:0: Failed to tear down display engine channel
Jun 14 12:44:14 fedora kernel: NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x00000011 for fn 10!
Jun 14 12:44:14 fedora kernel: NVRM: rpcRmApiFree_GSP: GspRmFree failed: hClient=0xc1d00002; hObject=0x00010011; paramsStatus=0x00000000; status=0x00000011
Jun 14 12:44:14 fedora kernel: NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
Jun 14 12:44:14 fedora kernel: NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
Jun 14 12:44:14 fedora kernel: nvidia-modeset: ERROR: GPU:0: Failed to tear down Disp
Jun 14 12:44:14 fedora kernel: ------------[ cut here ]------------
Jun 14 12:44:14 fedora kernel: WARNING: CPU: 6 PID: 22440 at nvidia/nv.c:4430 nv_suspend_devices+0x19c/0x300 [nvidia]
Jun 14 12:44:14 fedora kernel: Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables ip_set qrtr uhid bnep sunrpc binfmt_misc vfat fat snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_sof_board_helpers snd_sof_probes snd_soc_intel_hda_dsp_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_sdca snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_soc_core snd_hda_codec_hdmi snd_compress ac97_bus snd_pcm_dmaengine
Jun 14 12:44:14 fedora kernel:  squashfs intel_uncore_frequency intel_uncore_frequency_common iwlmvm x86_pkg_temp_thermal intel_powerclamp coretemp mac80211 kvm_intel nvidia_drm(OE) nvidia_modeset(OE) libarc4 kvm nvidia_uvm(OE) snd_hda_intel iwlwifi snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec uvcvideo btusb btrtl snd_hda_core uvc btintel videobuf2_vmalloc spi_nor iTCO_wdt videobuf2_memops videobuf2_v4l2 btbcm irqbypass intel_pmc_bxt snd_hwdep videobuf2_common btmtk rapl mtd mei_hdcp mei_pxp spd5118 iTCO_vendor_support intel_rapl_msr cfg80211 snd_seq intel_cstate bluetooth videodev thunderbolt snd_seq_device nvidia(OE) processor_thermal_device_pci intel_uncore snd_pcm processor_thermal_device mc processor_thermal_wt_hint mei_me pcspkr processor_thermal_rfim snd_timer processor_thermal_rapl wmi_bmof snd mei intel_rapl_common spi_intel_pci i2c_i801 idma64 rfkill spi_intel processor_thermal_wt_req soundcore i2c_smbus igen6_edac processor_thermal_power_floor processor_thermal_mbox int3403_thermal intel_pmc_core joydev
Jun 14 12:44:14 fedora kernel:  int340x_thermal_zone pmt_telemetry int3400_thermal pmt_class intel_hid acpi_thermal_rel acpi_tad sparse_keymap acpi_pad loop nfnetlink zram lz4hc_compress lz4_compress dm_crypt xe drm_ttm_helper gpu_sched drm_suballoc_helper drm_gpuvm drm_exec i915 i2c_algo_bit drm_buddy ttm nvme sdhci_pci drm_display_helper nvme_core sdhci_uhs2 polyval_clmulni sdhci polyval_generic ghash_clmulni_intel sha512_ssse3 cqhci sha256_ssse3 hid_multitouch nvidia_wmi_ec_backlight mmc_core sha1_ssse3 cec nvme_auth intel_vsec i2c_hid_acpi i2c_hid video wmi pinctrl_tigerlake i2c_dev fuse
Jun 14 12:44:14 fedora kernel: CPU: 6 UID: 0 PID: 22440 Comm: nvidia-sleep.sh Tainted: G           OE      6.14.9-300.fc42.x86_64 #1
Jun 14 12:44:14 fedora kernel: Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Jun 14 12:44:14 fedora kernel: Hardware name: Razer Blade 15 (2022) - RZ09-0421/CH580, BIOS 2.06 11/01/2023
Jun 14 12:44:14 fedora kernel: RIP: 0010:nv_suspend_devices+0x19c/0x300 [nvidia]
Jun 14 12:44:14 fedora kernel: Code: 3c d9 e9 f4 fe ff ff 48 c7 c7 f0 29 75 c1 e8 9b ae 16 da 31 ed 48 83 c4 08 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b 4c 89 e7 e8 7a ae 16 da 4d 85 ed 74 0d e8 20 d0 12 00 84 c0
Jun 14 12:44:14 fedora kernel: RSP: 0018:ffffd391a8f5bda8 EFLAGS: 00010206
Jun 14 12:44:14 fedora kernel: RAX: 0000000000000011 RBX: ffff8d10c3600000 RCX: ffffd391a8f5bd38
Jun 14 12:44:14 fedora kernel: RDX: 0000000000000000 RSI: 0000000000000292 RDI: ffffd391a8f5bcf8
Jun 14 12:44:14 fedora kernel: RBP: 0000000000000011 R08: 0000000000000000 R09: ffffffffc1712b86
Jun 14 12:44:14 fedora kernel: R10: ffff8d10fbbc4de0 R11: fffff57704eef100 R12: ffff8d10c36006a8
Jun 14 12:44:14 fedora kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
Jun 14 12:44:14 fedora kernel: FS:  00007f23b947f740(0000) GS:ffff8d145d500000(0000) knlGS:0000000000000000
Jun 14 12:44:14 fedora kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 14 12:44:14 fedora kernel: CR2: 000055ad5638b888 CR3: 0000000104b7b001 CR4: 0000000000f72ef0
Jun 14 12:44:14 fedora kernel: PKRU: 55555554
Jun 14 12:44:14 fedora kernel: Call Trace:
Jun 14 12:44:14 fedora kernel:  <TASK>
Jun 14 12:44:14 fedora kernel:  nv_set_system_power_state+0x8a/0x1a0 [nvidia]
Jun 14 12:44:14 fedora kernel:  nv_procfs_write_suspend+0x102/0x1c0 [nvidia]
Jun 14 12:44:14 fedora kernel:  ? security_file_permission+0x50/0xf0
Jun 14 12:44:14 fedora kernel:  proc_reg_write+0x57/0xb0
Jun 14 12:44:14 fedora kernel:  vfs_write+0xf1/0x470
Jun 14 12:44:14 fedora kernel:  ? count_memcg_events.constprop.0+0x1a/0x30
Jun 14 12:44:14 fedora kernel:  ? handle_mm_fault+0x227/0x340
Jun 14 12:44:14 fedora kernel:  ksys_write+0x74/0xf0
Jun 14 12:44:14 fedora kernel:  do_syscall_64+0x7b/0x160
Jun 14 12:44:14 fedora kernel:  ? exc_page_fault+0x7e/0x1a0
Jun 14 12:44:14 fedora kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jun 14 12:44:14 fedora kernel: RIP: 0033:0x7f23b94efa06
Jun 14 12:44:14 fedora kernel: Code: 5d e8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 19 83 e2 39 83 fa 08 75 11 e8 26 ff ff ff 66 0f 1f 44 00 00 48 8b 45 10 0f 05 <48> 8b 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08
Jun 14 12:44:14 fedora kernel: RSP: 002b:00007ffd08fa8470 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
Jun 14 12:44:14 fedora kernel: RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f23b94efa06
Jun 14 12:44:14 fedora kernel: RDX: 0000000000000008 RSI: 00005621bebb9e10 RDI: 0000000000000001
Jun 14 12:44:14 fedora kernel: RBP: 00007ffd08fa8490 R08: 0000000000000000 R09: 0000000000000000
Jun 14 12:44:14 fedora kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000008
Jun 14 12:44:14 fedora kernel: R13: 00005621bebb9e10 R14: 00007f23b966b5c0 R15: 0000000000000000
Jun 14 12:44:14 fedora kernel:  </TASK>
Jun 14 12:44:14 fedora kernel: ---[ end trace 0000000000000000 ]---
Jun 14 12:44:14 fedora kernel: ------------[ cut here ]------------