Replaced older nvidia card with rtx3060, now ubuntu 21.10/22.04 is super sticky with nvidia drivers

Hi

I replaced my older gtx 1660 with a 3060 in ubuntu 21.10, and afterwards the gui was super sticky. I can grab a window and move it around fine but every two seconds it freezes for about 5 seconds.

I tried switching between the included 470 and 510 drivers, but no dice. Also updated to 22.04, no luck.

then i removed the nvidia drivers using a guide online.

after this, using the nouveau drivers, everything was smooth as butter, albeit low resolution. As i really need full res and nvidia drivers for work, I decided to install the latest 510 drivers from the nvidia website manually. Now i have the same problem again with everything being super sticky and slow.

machine is a threadripper 3970x.

Anything else I can try before I do a clean reinstall?
Thanks

EDIT: I just want to add that everything is only super sticky if i have any application running that is anything above the absolute basic. for example if slack or empty blender is open then it is sticky. if only a terminal and file browser is open: not sticky.

nvidia-bug-report.log.gz (526.9 KB)

Sorry, the forum didnt allow me to link to the guide i used to remove the drivers, but here it is: How to Uninstall NVIDIA Drivers in Ubuntu - Fedingo

nvidia-smi is reporting errors:

Fan Speed : Unknown Error
Performance State : P0
Clocks Throttle Reasons : Unknown Error
Clocks
Graphics : Unknown Error
SM : Unknown Error
Memory : Unknown Error
Video : Unknown Error

I suspect the gpu might be broken. Please reseat the card in its slot, check in another system for a general hardware failure.

Hi

Thanks for your reply.

I had trouble finding another machine to try on, but this morning I managed. The card seemed to work fine on it. I booted onto a live usb drive with pop_os with nvidia drivers included and everything worked well. blender was fine and nvidia-smi looked OK.

Then i put the card back into the original machine but in a different pcie slot (and kept the original gpu in place). I didnt get any image from it at all this time. This is the output i got (gtx 1660 is my old card, 3060 is new card):

gustaf@simone:~$ nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 1660 (UUID: GPU-c4108724-2cb2-6f96-5b14-b02e2d2b89bc)

gustaf@simone:~$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller: NVIDIA Corporation TU116 USB Type-C UCSI Controller (rev a1)
21:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)
21:00.1 Audio device: NVIDIA Corporation Device 228e (rev a1)

gustaf@simone:~$ sudo dmesg | grep -i error
[ 0.308197] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.WT1A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308204] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308210] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.MT1A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308213] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308218] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.WT2A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308221] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308226] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.MT2A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308229] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308233] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.WT3A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308237] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308241] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.MT3A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308244] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308248] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.WT4A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308251] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308256] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.MT4A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308259] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308263] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CA.MT5A], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308266] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308272] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.WT1B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308275] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308280] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.MT1B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308283] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308287] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.WT2B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308290] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308295] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.MT2B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308298] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308302] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.WT3B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308305] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308310] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.MT3B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308313] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308317] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.WT4B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308320] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308325] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.MT4B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308328] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308332] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CB.MT5B], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308335] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308341] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.WT1C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308344] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308349] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.MT1C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308352] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308356] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.WT2C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308359] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308364] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.MT2C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308367] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308371] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.WT3C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308374] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308379] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.MT3C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308382] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308386] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.WT4C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308389] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308394] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.MT4C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308397] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308401] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.MT5C], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308404] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308410] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.WT1D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308413] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308417] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.MT1D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308421] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308425] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.WT2D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308428] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308432] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.MT2D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308436] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308440] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.WT3D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308443] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308447] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.MT3D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308450] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308455] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.WT4D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308458] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308462] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.MT4D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308465] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.308470] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CD.MT5D], AE_ALREADY_EXISTS (20210730/dswload2-326)
[ 0.308473] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210730/psobject-220)
[ 0.757065] pcieport 0000:00:01.1: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
[ 0.757641] pcieport 0000:20:03.1: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
[ 0.758184] pcieport 0000:40:01.1: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
[ 0.758376] pcieport 0000:40:01.2: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
[ 0.758572] pcieport 0000:40:03.1: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
[ 1.386422] RAS: Correctable Errors collector initialized.
[ 8.131698] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none.
[ 8.454172] nvidia: probe of 0000:21:00.0 failed with error -1
[ 9.316549] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000
[ 9.319036] ucsi_ccg: probe of 1-0008 failed with error -110
[ 9.624626] iwlwifi 0000:45:00.0: Start IWL Error Log Dump:
[ 9.637919] iwlwifi 0000:45:00.0: 0x00A30001 | IML/ROM error/state
[ 9.638558] iwlwifi 0000:45:00.0: 0xC644EF00 | FSEQ_ERROR_CODE
[ 10.584409] snd_hda_intel 0000:21:00.1: Codec #0 probe error; disabling it…
[ 13.085272] iwlwifi 0000:45:00.0: Start IWL Error Log Dump:
[ 13.093749] iwlwifi 0000:45:00.0: Start IWL Error Log Dump:
[ 13.096331] iwlwifi 0000:45:00.0: 0x00000003 | IML/ROM error/state
[ 13.097009] iwlwifi 0000:45:00.0: 0x20000000 | FSEQ_ERROR_CODE
[ 16.644451] iwlwifi 0000:45:00.0: Start IWL Error Log Dump:
[ 16.646602] iwlwifi 0000:45:00.0: 0x00000003 | IML/ROM error/state
[ 16.647203] iwlwifi 0000:45:00.0: 0x20000000 | FSEQ_ERROR_CODE
[ 29.616671] xhci_hcd 0000:47:00.3: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 11 comp_code 1
gustaf@simone:~$ sudo dmesg | grep -i nvidia
[ 8.363326] nvidia: loading out-of-tree module taints kernel.
[ 8.363336] nvidia: module license ‘NVIDIA’ taints kernel.
[ 8.381290] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 8.404331] nvidia-nvlink: Nvlink Core is being initialized, major device number 504
[ 8.409853] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 8.454009] nvidia 0000:21:00.0: enabling device (0000 → 0003)
[ 8.454119] nvidia 0000:21:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 8.454144] NVRM: The NVIDIA GPU 0000:21:00.0
[ 8.454172] nvidia: probe of 0000:21:00.0 failed with error -1
[ 8.454194] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 8.454195] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 510.68.02 Wed Apr 20 21:10:34 UTC 2022
[ 8.471745] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 510.68.02 Wed Apr 20 21:04:10 UTC 2022
[ 8.568260] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 8.630223] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input14
[ 8.630277] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input15
[ 8.630300] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input16
[ 8.630338] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input17
[ 8.630359] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input18
[ 8.630393] input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input19
[ 8.630416] input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input20
[ 8.870211] caller os_map_kernel_space.part.0+0x82/0xb0 [nvidia] mapping multiple BARs
[ 9.316549] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000
[ 9.375273] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
[ 11.804878] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input21
[ 11.804915] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input22
[ 11.804941] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input23
[ 11.804982] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input24
[ 11.805009] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input25
[ 11.805056] input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input26
[ 11.805084] input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:20/0000:20:03.1/0000:21:00.1/sound/card2/input27
[ 18.896128] audit: type=1400 audit(1652861905.172:6): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name=“nvidia_modprobe” pid=1710 comm=“apparmor_parser”
[ 18.896131] audit: type=1400 audit(1652861905.172:7): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name=“nvidia_modprobe//kmod” pid=1710 comm=“apparmor_parser”

Any ideas?
Thanks

Then the issue rather seems to be the mainboard. Please check for a bios update.

I updated bios, turned off CSM and turned on 4G decoding. Now nvidia-smi is happier as you can see in the picture, but gui is still very sticky.

As you can see the fan speed is missing. I can not get them started using nvidia settings.

The thing is that they spin during POST so I know theres nothing physically wrong with them.

I don’t know, I still think the card is broken. Should be still under warranty, can’t you just have i replaced?

Finally had the card replaced. Same problem m(_ _)m
What else can I try?

Please create a new bug-report.log

Here you go. I have also updated the bios to the latest beta which adds BAR resize and i am running it with CSM disabled and 4g+BAR enabled.

nvidia-bug-report.log.gz (405.8 KB)

Thank you, i really appreciate you taking the time to help me.

Another user just reported running into the exact same issue, an XID 62 followed by XID 16 and Xorg slowdown:
https://forums.developer.nvidia.com/t/510-515-drivers-throw-xid-62-16-with-certain-nvreg-registrydwords-set/217009

While you don’t have any registry dword set and also mentioned that the 470 driver didn’t work either, maybe something can be found both system have in common.

In general, did you already try using a different pcie slot, your board seems to have four?
Is it possible to limit pcie speeds in bios settings, e.g. to gen 3?

Hi

I did try a different pcie slot and it didnt help. I will try changing to pcie3 next time i put the card back in (tonight or tomorrow, I use the computer for work)

Thanks

I also have a similar issue. When I run nvidia-smi, it gives the following errors:

ACPI BIOS ERROR (bug)
ACPI ERROR: AE_ALREADY_EXISTS
ACPI ERROR: Aborting method ...

Here is info about my system:

  • GPU: 2x MSI RTX 3090 Ti Suprim X (NVLink)
  • MB: ASUS Z690 Maximus Apex (w/ latest bios, 1505)
  • CPU: Intel Core i9 12900KS
  • OS: Ubuntu 20.04.4 LTS (kernel 5.13.0-48)
  • GPU Driver: 510.68.02

I have the same problems. I am excepting the better solution.
os : redhat-9, linux kenel - 5.14
gpu: 3090ti x 2
cpu : i9-12900k
mb : Asus rog strix z690-e gamming wifi (latest bios, 2022/07)
gpu driver : 515 (latest) or 470