Replaced older nvidia card with rtx3060, now ubuntu 21.10/22.04 is super sticky with nvidia drivers

Hi

I replaced my older gtx 1660 with a 3060 in ubuntu 21.10, and afterwards the gui was super sticky. I can grab a window and move it around fine but every two seconds it freezes for about 5 seconds.

I tried switching between the included 470 and 510 drivers, but no dice. Also updated to 22.04, no luck.

then i removed the nvidia drivers using a guide online.

after this, using the nouveau drivers, everything was smooth as butter, albeit low resolution. As i really need full res and nvidia drivers for work, I decided to install the latest 510 drivers from the nvidia website manually. Now i have the same problem again with everything being super sticky and slow.

machine is a threadripper 3970x.

Anything else I can try before I do a clean reinstall?
Thanks

EDIT: I just want to add that everything is only super sticky if i have any application running that is anything above the absolute basic. for example if slack or empty blender is open then it is sticky. if only a terminal and file browser is open: not sticky.

nvidia-bug-report.log.gz (526.9 KB)

Sorry, the forum didnt allow me to link to the guide i used to remove the drivers, but here it is: How to Uninstall NVIDIA Drivers in Ubuntu - Fedingo

nvidia-smi is reporting errors:

Fan Speed : Unknown Error
Performance State : P0
Clocks Throttle Reasons : Unknown Error
Clocks
Graphics : Unknown Error
SM : Unknown Error
Memory : Unknown Error
Video : Unknown Error

I suspect the gpu might be broken. Please reseat the card in its slot, check in another system for a general hardware failure.

Hi

Thanks for your reply.

I had trouble finding another machine to try on, but this morning I managed. The card seemed to work fine on it. I booted onto a live usb drive with pop_os with nvidia drivers included and everything worked well. blender was fine and nvidia-smi looked OK.

Then i put the card back into the original machine but in a different pcie slot (and kept the original gpu in place). I didnt get any image from it at all this time. This is the output i got (gtx 1660 is my old card, 3060 is new card):

gustaf@simone:~$ nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 1660 (UUID: GPU-c4108724-2cb2-6f96-5b14-b02e2d2b89bc)

gustaf@simone:~$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller: NVIDIA Corporation TU116 USB Type-C UCSI Controller (rev a1)
21:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)
21:00.1 Audio device: NVIDIA Corporation Device 228e (rev a1)

gustaf@simone:~$ sudo dmesg | grep -i error
[ 0.308197] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.WT1A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308204] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308210] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.MT1A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308213] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308218] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.WT2A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308221] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308226] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.MT2A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308229] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308233] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.WT3A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308237] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308241] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.MT3A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308244] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308248] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.WT4A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308251] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308256] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.MT4A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308259] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308263] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.MT5A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308266] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308272] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.WT1B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308275] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308280] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.MT1B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308283] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308287] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.WT2B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308290] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308295] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.MT2B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308298] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308302] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.WT3B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308305] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308310] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.MT3B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308313] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308317] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.WT4B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308320] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308325] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.MT4B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308328] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308332] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.MT5B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308335] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308341] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.WT1C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308344] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308349] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.MT1C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308352] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308356] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.WT2C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308359] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308364] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.MT2C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308367] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308371] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.WT3C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308374] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308379] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.MT3C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308382] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308386] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.WT4C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308389] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308394] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.MT4C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308397] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308401] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.MT5C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308404] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308410] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.WT1D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308413] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308417] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.MT1D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308421] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308425] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.WT2D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308428] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308432] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.MT2D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308436] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308440] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.WT3D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308443] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308447] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.MT3D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308450] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308455] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.WT4D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308458] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308462] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.MT4D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308465] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308470] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.MT5D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308473] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.757065] pcieport 0000:00:01.1: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
[ 0.757641] pcieport 0000:20:03.1: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
[ 0.758184] pcieport 0000:40:01.1: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
[ 0.758376] pcieport 0000:40:01.2: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
[ 0.758572] pcieport 0000:40:03.1: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
[ 1.386422] RAS: Correctable Errors collector initialized.
[ 8.131698] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none.
[ 8.454172] nvidia: probe of 0000:21:00.0 failed with error -1
[ 9.316549] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000
[ 9.319036] ucsi_ccg: probe of 1-0008 failed with error -110
[ 9.624626] iwlwifi 0000:45:00.0: Start IWL Error Log Dump:
[ 9.637919] iwlwifi 0000:45:00.0: 0x00A30001 | IML/ROM error/state
[ 9.638558] iwlwifi 0000:45:00.0: 0xC644EF00 | FSEQ_ERROR_CODE
[ 10.584409] snd_hda_intel 0000:21:00.1: Codec #0 probe error; disabling it…
[ 13.085272] iwlwifi 0000:45:00.0: Start IWL Error Log Dump:
[ 13.093749] iwlwifi 0000:45:00.0: Start IWL Error Log Dump:
[ 13.096331] iwlwifi 0000:45:00.0: 0x00000003 | IML/ROM error/state
[ 13.097009] iwlwifi 0000:45:00.0: 0x20000000 | FSEQ_ERROR_CODE
[ 16.644451] iwlwifi 0000:45:00.0: Start IWL Error Log Dump:
[ 16.646602] iwlwifi 0000:45:00.0: 0x00000003 | IML/ROM error/state
[ 16.647203] iwlwifi 0000:45:00.0: 0x20000000 | FSEQ_ERROR_CODE
[ 29.616671] xhci_hcd 0000:47:00.3: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 11 comp_code 1
gustaf@simone:~$ sudo dmesg | grep -i nvidia
[ 8.363326] nvidia: loading out-of-tree module taints kernel.
[ 8.363336] nvidia: module license ‘NVIDIA’ taints kernel.
[ 8.381290] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 8.404331] nvidia-nvlink: Nvlink Core is being initialized, major device number 504
[ 8.409853] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 8.454009] nvidia 0000:21:00.0: enabling device (0000 → 0003)
[ 8.454119] nvidia 0000:21:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 8.454144] NVRM: The NVIDIA GPU 0000:21:00.0
[ 8.454172] nvidia: probe of 0000:21:00.0 failed with error -1
[ 8.454194] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 8.454195] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 510.68.02 Wed Apr 20 21:10:34 UTC 2022
[ 8.471745] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 510.68.02 Wed Apr 20 21:04:10 UTC 2022
[ 8.568260] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 8.630223] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input14
[ 8.630277] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input15
[ 8.630300] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input16
[ 8.630338] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input17
[ 8.630359] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input18
[ 8.630393] input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input19
[ 8.630416] input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input20
[ 8.870211] caller os_map_kernel_space.part.0+0x82/0xb0 [nvidia] mapping multiple BARs
[ 9.316549] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000
[ 9.375273] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
[ 11.804878] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input21
[ 11.804915] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input22
[ 11.804941] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input23
[ 11.804982] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input24
[ 11.805009] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input25
[ 11.805056] input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input26
[ 11.805084] input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input27
[ 18.896128] audit: type=1400 audit(1652861905.172:6): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name=“nvidia_modprobe” pid=1710 comm=“apparmor_parser”
[ 18.896131] audit: type=1400 audit(1652861905.172:7): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name=“nvidia_modprobe//kmod” pid=1710 comm=“apparmor_parser”

Any ideas?
Thanks

Then the issue rather seems to be the mainboard. Please check for a bios update.