GPU detected but malfunctioning, ongoing problem, over a week of troubleshooting, send help please!

OK so quick details-

My rig-

MoBo: Gigabyte B550 ds3h motherboard

CPU: AMD® Ryzen 5 5600g

GPU: NVIDIA Corporation TU106 [GeForce RTX 2070]

OS: pop_OS 22.04 (LTS)

Graphics Driver: 515.48.07

BIOS: F15a

Now the problems-

Ok so like a week ago I went to distrohop to Fedora and learned that my BIOS had completely disappeared, systemctl reboot --firmware-setup didn’t even initiate a reboot because it couldn’t find the BIOS. After jumping through some hoops with every possible alternative, I eventually cleared the CMOS and like magic my BIOS was back and I was able to install Fedora.
Upon my initial boot into BIOS I noticed some settings had changed and options pertaining to my GPU were all gone. Oh well, I’ll install Fedora and figure it out later. On first boot with Fedora I noticed the fans on my GPU all turned off as soon as the initial Disk Encryption screen loaded. Weird as it was, I wasn’t too concerned, however after installing all of the NVIDIA software, I tried manually activating the GPU fans through nvidia-settings, only to see "Failed to set new Fan Speed! in the bottom left corner of the configuration window. In hopes that this was just a case of Fedora not playing nicely with NVIDIA, back to Pop I reverted.

I went through every set of graphics drivers, and the only time my fans stayed on was after purging all NVIDIA drivers, however that made 3d editing worthless. Launching nvidia-settings from the CLI, I get this: `~$ nvidia-settings

(nvidia-settings:8104): GLib-GObject-CRITICAL **: 12:33:38.312: g_object_unref: assertion ‘G_IS_OBJECT (object)’ failed

** (nvidia-settings:8104): WARNING **: 12:33:38.431: PRIME: Failed to execute child process “/usr/bin/prime-supported” (No such file or directory)
** Message: 12:33:38.431: PRIME: is it supported? no
`

and booting up I get:

[	2.752819] ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCIO.GP17.V
GA.LCD._BCM.AFN7], AE_NOT_FOUND (20210730/PSARGS-330)
[	2.753977] ACPI Error: Aborting method \_SB.PCIO.GP17.VGA.LCD._BCM due to previous error (AE_NOT_FOUND)	(20210730/psparse=529)

I run lspci -k and get

00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex
	Subsystem: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU
	Subsystem: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge
	Kernel driver in use: pcieport
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
	Kernel driver in use: pcieport
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
	Kernel driver in use: pcieport
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51)
	Subsystem: Gigabyte Technology Co., Ltd FCH SMBus Controller
	Kernel modules: i2c_piix4, sp5100_tco
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
	Subsystem: Gigabyte Technology Co., Ltd FCH LPC Bridge
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3
	Kernel driver in use: k10temp
	Kernel modules: k10temp
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7
01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2070] (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] TU106 [GeForce RTX 2070]
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
01:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] TU106 High Definition Audio Controller
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
01:00.2 USB controller: NVIDIA Corporation TU106 USB 3.1 Host Controller (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] TU106 USB 3.1 Host Controller
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
01:00.3 Serial bus controller: NVIDIA Corporation TU106 USB Type-C UCSI Controller (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] TU106 USB Type-C UCSI Controller
	Kernel modules: i2c_nvidia_gpu
02:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43ee
	Subsystem: ASMedia Technology Inc. Device 1142
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
02:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43eb
	Subsystem: ASMedia Technology Inc. Device 1062
	Kernel driver in use: ahci
	Kernel modules: ahci
02:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43e9
	Kernel driver in use: pcieport
03:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
	Kernel driver in use: pcieport
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 16)
	Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet
	Kernel driver in use: r8169
	Kernel modules: r8169
05:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function (rev c9)
	Subsystem: Gigabyte Technology Co., Ltd Zeppelin/Raven/Raven2 PCIe Dummy Function
05:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
05:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
	Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
	Kernel driver in use: ccp
	Kernel modules: ccp
05:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
	Subsystem: Gigabyte Technology Co., Ltd Renoir/Cezanne USB 3.1
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
05:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
	Subsystem: Gigabyte Technology Co., Ltd Renoir/Cezanne USB 3.1
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
05:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller
	DeviceName: Realtek ALC1220
	Subsystem: Gigabyte Technology Co., Ltd Family 17h (Models 10h-1fh) HD Audio Controller
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

I’m not really shure why you want your gpu fans to always be on, they should kick in automatically when the gpu heats up.
Setting the fan speed manually only works when the Xserver is running as root. Which isn’t the default nowadays on most distros. Search the forums for the common workarounds.

They were always idling at a low rpm before the issue, now it gets it 70 degrees Celsius and they still don’t kick in, so it not that I want them always running, I just want them to turn themselves on when they’re supposed to.

The issue is the fans straight don’t turn on except at boot and are off by the time I decrypt my disk, I’ve found workarounds to turn them on manually since but I can’t rely on manually activating and adjusting them long term. Furthermore everything runs as if I’m using integrated graphics so it seems to be malfunctioning

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

Well reading through the log, it appears my kernel make is invalid, still trying to understand the process moving forward, and confused as to how since I’ve hopped distros to try different solutions since this problem began and even wrote out my SSD with zeros I don’t understand when the kernel is no bueno.

PS, I’m running Elementary OS now, if it matters
nvidia-bug-report.log.gz (272.8 KB)

Nothing unusual in the logs. Please create a new one with some load on the gpu, e.g. while running an unigine demo.

Doesn’t the log say the kernel is improperly configured? Give me 20 minutes I’m in the shower

That’s merely a warning that’s always displayed, can be safely ignored.

1 Like

Ok so when I open nvidia-settings using root I get-

(nvidia-settings:11151): GLib-GObject-CRITICAL **: 16:07:57.905: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
** Message: 16:07:58.138: PRIME: No offloading required. Abort
** Message: 16:07:58.138: PRIME: is it supported? no

I don’t recall the Prime message on other distros, but Elementary does give me manual control over my fans.

I’m currently waiting for Mordhau to boot, I just installed steam and then the game, now I’m waiting to process vulkan shaders for first boot, sorry for the untimely response

Edit: Despite the prime message not popping up on pop-os i do recall finding it was not enabled on there either

Ok the fans kicked in after a few minutes of play, around 74 celcius, which works I suppose but any idea why my settings to adjust the fan curve seem to have disappeared? I’d like the fans to kick in a little sooner, and the game still seemed to run pretty rough (it’s an intensive game but I turned the settings to high and play on 1080)

nvidia-bug-report.log.gz (381.9 KB)

Gpu utilization was only at 38%, so it rather seems there’s a different issue (cpu/compositor, etc) which are keeping the gpu from being fully utilized.
Regarding fan control, like said, the Xserver has to run as root.

@cameronwalfoort
Request you to please run Unigine/Furmark benchmark along with tools like CoreCtrl, GreenWithEnvy, GOverlay and WattmanGTK to note down Fan Speed and GPU utilization and share with us.
Also please try to verify if GPU fan always starts spinning once temperature reaches ~74 Celsius.
Meanwhile I will try to find similar setup and attempt for local repro.

@cameronwalfoort

Request you to please run Unigine/Furmark benchmark along with tools like CoreCtrl, GreenWithEnvy, GOverlay and WattmanGTK to note down Fan Speed and GPU utilization and share with us.
Also please try to verify if GPU fan always starts spinning once temperature reaches ~74 Celsius.