Have you enabled freesync or g-sync in your monitor?
There is a new regression in 575.57.08 that was not present in the previous 575 release. Clair Obscur: Expedition 33 now crashes on launch with a Vulkan loader error:
Bug report:
nvidia-bug-report.log.gz (671.0 KB)
Proton log:
steam-1903340.log (809.4 KB)
System information:
System:
Host: blackwell Kernel: 6.15.0-2-cachyos arch: x86_64 bits: 64
Desktop: KDE Plasma v: 6.3.5 Distro: CachyOS
CPU:
Info: 8-core model: AMD Ryzen 7 9800X3D bits: 64 type: MT MCP cache:
L2: 8 MiB
Speed (MHz): avg: 5228 min/max: 603/5272 cores: 1: 5228 2: 5228 3: 5228
4: 5228 5: 5228 6: 5228 7: 5228 8: 5228 9: 5228 10: 5228 11: 5228 12: 5228
13: 5228 14: 5228 15: 5228 16: 5228
Graphics:
Device-1: NVIDIA GB202 [GeForce RTX 5090] driver: nvidia v: 575.57.08
Display: wayland server: X.org v: 1.21.1.16 with: Xwayland v: 24.1.6
compositor: kwin_wayland driver: gpu: nvidia,nvidia-nvswitch
resolution: 5120x2160~165Hz
API: EGL v: 1.5 drivers: nouveau,nvidia,swrast
platforms: gbm,wayland,x11,surfaceless,device
API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 575.57.08
renderer: NVIDIA GeForce RTX 5090/PCIe/SSE2
API: Vulkan v: 1.4.313 drivers: nvidia,llvmpipe surfaces: N/A
Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
de: kscreen-console,kscreen-doctor gpu: nvidia-settings,nvidia-smi
wl: wayland-info x11: xdpyinfo, xprop, xrandr
Debian packages for 575.51.03
still donāt create necessary links for nvidia-settings
just as described in the 1st post of 570 feedback:
Reporting BUG
Synopsis: 575.57.08 nvidia-uvm does not build for Linux 6.15 on my system.
Brief Description: 575.57.08 nvidia-uvm does not build for Linux 6.15 on my system.
Steps to Reproduce and frequency: (Always)
- Using Linux Mint 22.1 (Ubuntu noble) on Linux 6.15, boot single-user
- run ./NVIDIA-Linux-x86_64-575.57.08.run --skip-module-load
- select MIT/GPL driver branch (default)
- It will error out that it canāt build nvidia-uvm
Workaround: run ./NVIDIA-Linux-x86_64-575.57.08.run --no-unified-memory --skip-module-load
System Configuration: System76 Thelio Mega v1.1 x86_64 NVIDIA RTX 3090Ti 24G
Linux 6.15.0 Mint 22.1 Xfce 4.18 Mem: 258G
Monitor: ASUS ROG Swift 41.5ā 4K OLED Gaming Monitor (PG42UQ) - UHD (3840 x 2160) running at 120Hz.
Bug report generated after compilation failure:
nvidia-bug-report.log.gz (495.4 KB)
This same compilation error is present with the release driver, 570.153.02.
Thank you for all that you do, NVIDIA! :)
With latest 575.51.02 driver, after working for some time, CUDA started to fail to initialize after a day of uptime:
cu->cuInit(0) failed -> CUDA_ERROR_NOT_INITIALIZED: initialization error
It was working well for more than a day. I wonder is there any workaround to bring CUDA back, or at this point only reboot can help in such cases?
I am attaching bug report here:
nvidia-bug-report.log.gz (3.5 MB)
If more details about my system or errors are needed, I shared them here.
UPDATE: After dozens of attempt to run llama-server again, it started working - just by attempting, nothing else, and mpv and other CUDA applications also started working. Just waiting did not seem to help, but actively trying to initialize CUDA does. I also noticed that the issue is more likely to occur if I forcefully stop stop llama-server by multiple Ctrl+C presses. Since the issue is system-wide, not related to any particular app, and it may happen without me stopping anything. It seems to be a driver bug. The attached bug report was done when the issue was present, so hopefully it provides an insight what exactly causes it in the driver.
Unfortunately soon after CUDA started failing to initialize reliably, all GPUs fell off the bus due to Linux Nvidia driver bug (unlike hardware causes that may cause GPU fall off the bus, this one happens only in Linux but not with Windows Nvidia driver). Also, I think CUDA initialization errors prior to GPUs falling off the bus happened before on another driver version, just the other time I did not collect bug report info. So it may be related issue that may reveal itself as a precursor, but I am not sure.
Here is additional bug report log - the one in the previous message was during CUDA failing to initialize issues, and this one after GPUs fell off the bus:
nvidia-bug-report.log.gz (2.8 MB)
I would like to reiterate that this issue is specific to Linux Nvidia driver. In the main thread about GPU fell off the bus issue, I already provided links to multiple independent reports (1, 2) of people having the GPU falling off the bus issue in Linux, but working fine in Windows.
Can someone from Nvidia investigate this please? And let me know if I can provide more debug information.
I have raised couple of bugs as below for tracking purpose.
5317891 - nvidia-uvm does not build for Linux 6.15 on system with driver 575.57.08
5317863 - Game crashes on launch with a Vulkan loader error with driver 575.57.08
STATUS: Checking if this can reproduce.
Hi @thesourcehim
As per error prints, system is not supported for DBDC, and the error message is not fatal and can be ignored.
You wonāt see performance benefits in DC state, but AC will work as it was working in 570.153.02 driver.
Also posted on 570 thread, but am posting here too because the issue is also present on the latest 575.57.08
The power usage stats reported by the driver in 32 bit applications have been incorrect since version 570.124.04.
This causes mangohud to display GPU power usage of 0.0W in older games like Portal 1, 2, and other 32-bit games.
Related post in 570 thread: 570 release feedback & discussion - #497 by Xpander
Upstream mangohud issue report, although this is a bug on nvidias side: GPU power draw shows wrong numbers with 32bit applications Ā· Issue #1607 Ā· flightlessmango/MangoHud Ā· GitHub
Related to this, nvidia-smi is no longer able to report voltage, this has been present for the last few driver releases too.
nvidia-smi -q | grep Voltage
now reports no value.
It was useful when undervolting but since I think 565 or 570 the value has been missing. I think its related to nvidia-smi now using nvml.
Thanks for your help nvidia team!
Debian packages are broken due to broken dependencies.
@paulaner this should be resolved tomorrow:
Lots of unresolved symbols on recent Xorg (master):
- dixChangeGC
- dixLookupPrivate (ā> this never existed as a symbol, always been an inline function)
- dixRequestPrivate
- dixSetPrivate
- LoadExtension
- miInitializeBackingStore
- miIntersect
- miRectsToRegion
- miRegionCopy
- miRegionCreate
- miRegionDestroy
- miRegionValidate
- miSubtract
- miTranslateRegion
- miUnion
- PanoramiXTranslateVisualID
- RRClientKnowsRates
- RRCrtcDetachScanoutPixmap
- RRCrtcGetScanoutSize
- RRCrtcGetTransform
- RRCrtcSetTransformSupport
- RRGetOutputProperty
- RROutputSetSubpixelOrder
- RRProviderCreate
- RRProviderSetCapabilities
- RRTransformSetFilter
- SetCriticalOutputPending
- TimerCheck
- TryClientEvents
- xf86BlockSIGIO
- xf86configptr
- xf86DisableGeneralHandler
- xf86DisableRandR
- xf86EnableGeneralHandler
- xf86IsScreenPrimary
- xf86MarkOptionUsedByName
- xf86RegisterRootWindowProperty
- xf86UnblockSIGIO
- Xfree
- XRC_DRAWABLE
- missing GEInitEvent
- GERegisterExtension
Theyāre abusing really internal things that video drivers have now business at all with.
Iām facing the same issue, I hope they solve it.
@amrits any chance this can be looked into since its such a long running problem?
Is dynamic boost supposed to be functional with the 575 driver? Using arch on a Proart P16 here and the dGPU never goes above the default max TGP of 55w (Nvidia-settings does report max TGP of 105W for my 4070).
I donāt see anything wrong with nvidia-powerd (besides the error mentioned above when running on battery: āERROR! Client (presumably SBIOS) has requested to disable Dynamic Boost DC controllerā).
Maybe Iām hallucinating and the error isnāt driver-related.
Any luck, laptop users?
Edit: In the logs, I see nvidia-powerd was bumped to version 2.0 with 575 (1.0 with 570 and below)
In my case I only get boost when power profile is set to performance (3060 laptop).
No one knows where all the VRAM is going?
Version 575.57.08 has been pushed to Debian repo and it fixes the referred dependency problems as well as all other issues mentioned so far here.
I must admit that I was overly pessimistic regarding this: many thanks to @scaronni, @aplattner and their teams for fixing this!!
This will spare me personally a lot of hassle of explaining to all sec-ops and sys-ops teams of my associated organizations why we need these hand-modified workaround packages and/or locked versions. ā¦And also no need to wait for the related security exception approvals anymore.
:)))))))
yes, but Iām already using performance mode:
[root@Koalarch sebastien]# cat /sys/firmware/acpi/platform_profile
performance
5317891 - nvidia-uvm does not build for Linux 6.15 on system with driver 575.57.08
This issue has been root caused, and fix should be available in future release drivers.