You should check if prime sync is enabled, run xrandr --verbose and check the prime synchronisation setting for eDP-1-1. If 1, it’s fine, if 0, set kernel parameter nvidia-drm.modeset=1
Ok, it’s 1, so it’s solved. I even installed the 460 driver now.
Thank you
@generix Thank you very much! Worked like a charm : )
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.
You have bbswitch installed, which is totally outdated and should not be useful in your setup.
I think it’s possible, this could cause this strange error message.
Please try removing it (sudo apt purge bbswitch-dkms).
Also delete your /etc/X11/xorg.conf (it was generated for 1 gpu).
Reboot.
See if the error message is gone.
If the second card comes up, you could re-run nvidia-xconfig --enable-all-gpus --cool-bits=12, which would configure your system with 1 Xscreen per gpu (if that is what you want).
Thank you
I removed bbswitch, and deleted the xorg.conf. I did notice that the error of bbswitch disappeared from the dmesg.
But still, one GPU appears in the nvidia-smi.
This the new bug report after the changes
nvidia-bug-report.log (653.8 KB)
Well, I don’t know. I see no error.
lspci shows both devices, but also shows that the second gpu is not in use by a driver.
What looks strange to me, that /proc/interrupts shows at #59 the nvidia card that’s working, but at #17, where the second one is:
17: 0 0 374 0 0 0 0 0 IO-APIC 17-fasteoi snd_hda_intel:card1
Maybe generix can spot something, he knows way more then me.
What I’d try, is to put the non working card into the slot of the working one and boot with just that one card, to see if it is the card itself, or an issue that’s more in the motherboards area.
The second gpu is not properly initialised by the bios, it has busmastering disabled so the nvidia driver refuses to use it.
Did you use the correct pcie slots on the mainboard? Please also check by swapping the gpu if this depends on the gpu or the slot.
Resetting the bios to defaults is also woth a shot.
Please dear generix, educate me. How to know that? What exactly in the bug-report is telling you this?
Thank you!
Thank you, I will try to swap the GPU, and check if the second GPU is on the correct pcie .
The bug report includes the output of lspci -vxxx -d 10de:* which outputs the pci config space at the end (the hex values). The bus-matering bit is bit 2 (value 4) of byte 4 (counting from 0). The value of byte 4 of the non-working gpu is 0x03, meaning bus-mastering is off. On a correctly initialised gpu this should be 0x06 before the driver loads, afterwards 0x07.
Edit: bit 2, not 3.
it seems I have a problem with my second GPU, I put the non-working card into the slot of the working one and boot with just that one card, as @Mart told me.
the gpu was not loaded. this the log: a-nvidia-bug-report.log (1.8 MB)
Yes, still same issue so must be a very subtle defect. Return to vendor, if still under warranty.
Deal, I will check if they can replace it.
Thank you.
Hey @generix
Everything was working fine up until my last driver update/kernel update.
This weekend my laptop started lagging for nowhere.
I think it might be due to this process in nvidia-smi that is consuming too much GPU usage for what it is:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 965M Off | 00000000:01:00.0 Off | N/A |
| N/A 43C P8 N/A / N/A | 562MiB / 2004MiB | 80% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1934 G /usr/lib/xorg/Xorg 254MiB |
| 0 N/A N/A 2111 G /usr/bin/gnome-shell 155MiB |
| 0 N/A N/A 2532 G …02F48F2FD8EAC747C3F161887 4MiB |
| 0 N/A N/A 4956 G …gAAAAAAAAA --shared-files 142MiB |
±----------------------------------------------------------------------------+
When I go back to only my integrated gpu, everything gets back to normal
The problem is that the gpu is not clocking up, it stays at P8. In that state, 80% gpu usage means ~8% gpu usage when fully clocked up.
Please check nvidia-settings gui/powermizer tab if that’s the case.
This should be independent of a kernel update, you also got upgraded to the latest 460.73 driver. Can you possibly downgrade to an ealier driver to check?
What I find it curious is:
- Lets say it’s on “on-demand” mode. If I change it to NVIDIA (Performance Mode) and restart, it doesn’t work right away. I have to restart it once again to make it work.
- After rebooting for the second time, it start running just fine and it takes about 30 seconds to start lagging/chopping.
I’m attaching the PowerMizer screenshot for when it is lagging.
I tried downgrading to 450, but nothing changed.
Is there a command to reset nvidia-settings setup, in case I changed something?
Sounds like it hit some thottling cause like overheating after a while, please create a nvidia-bug-report.log in the throttled state.
There you go nvidia-bug-report.log.gz (360.8 KB)
No throttle reasons so I guess this is something else. Looking at the logs, the gpu shut down some days ago, please try resetting the mainboard, don’t know how that works on a surface.
