Optimus on Ubuntu 18.04 is a step backwards ... but I found the first good solution

Maybe something really simple. Prerequsite is having bbswitch installed and previously used prime-select to switch to nvidia.

/etc/systemd/system/disablenvidia.service

[Unit]
Description=dGPU off during boot
After=nvidia-fallback.service
Before=display-manager.service

[Service]
Type=oneshot
ExecStart=/bin/sh -c "modprobe -r nvidia; modprobe bbswitch; echo OFF > /proc/acpi/bbswitch; logger NVIDIAOFF"

[Install]
WantedBy=display-manager.service

Run sudo systemctl daemon-reload to update.

Use
sudo systemctl enable disablenvidia.service
sudo systemctl restart display-manager
to switch to intel

Use
sudo systemctl disable disablenvidia.service
sudo tee /proc/acpi/bbswitch <<<ON
sudo modprobe nvidia
sudo systemctl restart display-manager
to switch to nvidia. If it works, can be put into a script.

There’s another flaw with the prime-socket solution, the makefile installs the nvidia-prime-boot.service directly into the ‘wants’ directory of systemd, which doesn’t work.

This prime-socket solution swaps between the two … until you boot. The display manager doesn’t start if you put it in intel mode.
When you do this, the script leaves a blacklist-nvidia.conf in /etc/modprobe.d/
However, the nvidia modules actually load anyway. If you do update-initramfs -u then the modules disappear, but this is not a good solution (

The blacklist file is not working … because the modules are already loaded, I guess. So in fact in intel mode, I guess they need to be rmmod

Yes, that’s why I put modprobe -r nvidia into the service file. It’s the short extract of the solution I use in Gentoo.

To remove the nvidia modules interactively, I see that I have to do it in this order

rmmod nvidia_drm
rmmod nvidia_modeset
rmmod nvidia

Should I actually do this in the service file?

… whoops, I see that modprobe -r is supposed to do this automatically, but it doesn’t seem to work.
I’ll try the long-winded rmmod approach …

For modprobe -r to work, a .conf file has to be installed in /etc/modprobe.d/nvidia-rmmod.conf containing

remove nvidia modprobe -r --ignore-remove nvidia-drm nvidia-modeset nvidia-uvm nvidia

This is normally installed along with the driver, see if ubuntu dropped that.
Don’t forget to rmmod the nvidia-uvm module which gets loaded when you use cuda.

Thanks generix for your help.
I have a working version now, here:

If anyone wants to test it, please do so and let me know. I’ll make some more documentation enhancements and submit a pull request to Matthieu.

Like said, ‘lightdm’ should be ‘display-manager’, this is a standardized alias for all DMs, like gdm,lightdm…
That’s why the the nvidia-prime-boot had ‘wantedby display-manager’. The prime-socket then doesn’t need to unload the modules, it just needs to restart display-manager and systemd automatically starts nvidia-prime-boot.service beforehand it starts lightdm/gdm… which takes care of the rest.

You’re right: ubuntu dropped it. I added nvidia-uvm.

Unfortunately, I don’t follow that, I don’t really understand the systemd stuff and the order of steps.
It seems that if I change to intel mode and then reboot, the nvidia drivers must be removed since they are loaded early in the boot process.

I changed the nvidia-prime-boot.service to be

WantedBy=multi-user.target

because I saw that in the other service file :)
It works, is it bad?

Ah, I re-read it. It’s not as complicated as I thought. You are suggesting that the nvidia-prime-boot should have a target that means it gets invoked whenever the display manager starts, either via a normal boot or when prime-socket is called, meaning prime-socket itself doesn’t need to rmmod the modules.

Anyway, it definitely works now, but perhaps that’s because my rmmod changes were the fix, not because of the WantedBy change.

However, it works, and considering I have never encountered rust or service files before, I will rest on my laurels, such as they are. The P50 is not my main machine any longer , but this is actually a better Optimus experience than I ever had.

Yes, you got it.

Anything is better than it is now in 18.04

That was the surprising part for me, though dropping the Optimus thing you put effort in it. Thank you. Somebody just had to do something.

OK, the above modification by Tim works fine for me:

I had a weird thing happen where after installing it (making sure old prime-socket was deleted first) and switching to intel, my X session went down and didn’t restart graphics, getting stuck at TTY. I rebooted from TTY though, and system booted into Intel graphics just fine. I see no issues with it for now, and given that it’s the first time I was able to see hybrid graphics work on Ubuntu, I’m a happy man. Thank you both for your help and efforts!

That said, I’m also a dev and I want to help. I’m in travels ATM and can’t risk my only OS until I’m back, but I want to drive a few nails into it too. Where can I read about the communications between nvidia driver, prime-socket service, display manager, lightdm and X server? What logs do I need to study to find issues? Is there an advanced debugger to see what breaks where (sort of GDB that would cover this whole stack or a part of it)? Also, if things go to hell, how would I pull out other than reinstalling the OS? I don’t have much low level experience with Ubuntu itself, but I used to torture Linux ARM boards a while ago, although kernel was way different and I didn’t have graphics on them :)

Thanks again to both of you and everyone else who worked on it.

Sorry I can’t be of much help with the other questions, but this one I know :)

There’s a tool called Backups, and it uses incremental backups, you even if the initial OS backup is pretty big, subsequent ones would be small. Just remember to discard any folders you don’t need backing up.

This code doesn’t actually do much. All it does is unloads nvidia drivers from memory if you are in Intel mode, basically. The new features of the nvidia driver seem to make swapping between hybrid and Intel really easy (so it is bitterly ironic that the devs decided to snatch defeat from the jaws of victory by abandoning bbswitch).
If you disable the two systemd services and rename the prime-select script in /use/local/bin so the ‘real’ one is unmasked, you’ll be back to standard.

The socket service runs in the background but if you want to see more you can stop the systemd service and run it sudo in a virtual terminal.
The rust code is easy to read, and so is the prime-select script. You’ll see what it’s doing in a few minutes.

When switching from intel to nvidia, the socket code calls modprobe nvidia to load the nvidia driver.
Sometimes modprobe gets stuck, causing endless logging messages. It does not seem to terminate with failure, though. It endlessly generates error messages (maybe about two per second).

I also notice that the nouveau module is loaded.
The nouveau module is definitely not loaded before doing prime-select nvidia and it is definitely not loaded by anything in the script.

How does nouveau get loaded? The prime-select script doesn’t touch /etc/modprobe.d/nvidia-graphics/drivers.conf which blacklists nouveau. It seems impossible that this module could load, and yet it is always loaded when I have this problem.

This bug is non-deterministic as far as I can work out, at least I can’t make it happen. But when it does, it’s a nuisance. Rebooting the computer always successsfully boots into nvidia mode every time, so the the setup done prior to modprobe nvidia is fine.

It all points to something going wrong with modprobe nvidia.
Update: it mostly seems to happen when the laptop boots in intel mode, that is, with the nvidia modules being rmmod at boot.
The good thing is that ssh still works perfectly well.

nvidia-bug-report.log.gz (97.9 KB)

This might also be gpu-manager still interfering. Make sure a static nvidia outputclass file is installed and use ‘nogpumanager’ as kernel parameter.
Instead of using turning gpu-manager off it might help to change the nvidia-prime-boot.service to start either Before=gpu-manager.service or After=gpu-manager.service

I just came back to post something I discovered, didn’t see your tip. There is a service called nvidia-fallback.service and judging by its log messages, it tries to load nouveau, and it seems to succeed. So I’ve changed prime-select to disable it when going into intel mode and to re-enable it going into nvidia. I haven’t seen the problem since on either of my Optimus laptops.

Awesome, thanks for all the info. I will test this a bit more once I’m back, and I plan to reinstall my 18.04 all afresh some time soon (been upgrading all the way from 17.04, so I have quite a few stinky leftovers here and there).

Here’s a twist to the story. I just did a sudo apt upgrade which among other things upgraded my gdm3 package. Upon rebooting I found myself using intel card, but sitting on gdm instead of lightdm. Looks like apt update silently ran a dpkg-reconfigure - even though it did mention it cannot be reloaded:

Setting up gdm3 (3.28.2-0ubuntu1.2) ...
gdm.service is not active, cannot reload.
invoke-rc.d: initscript gdm3, action "reload" failed.

I dpkg-reconfigure’d it back to lightdm, rebooted and am running the latter now as expected. Just a bit of heads up.

I find the current master branch as stable (or the latest release tag 0.9.1), I haven’t made any technical changes since June 13, both my Optimus laptops have gone through many reboots, suspend/resume cycles and mode changes without problems. The June 13 fixes were important though.

If there are issues please file an issue at https://github.com/timrichardson/Prime-Ubuntu-18.04

It is impossible for me to test hardware which I don’t have but I can help diagnose things, I’ve learnt a bit while working on this. Thanks for the work of the upstream author (Matthieu) I’ve realised that is actually not very complicated (famous last words).

Alberto Milone, the canonical developer who wrote the prime-select script, is working on a new approach which powers off the nvidia card by putting the system calls directly into his code.

https://bugs.launchpad.net/ubuntu/+source/ubuntu-drivers-common/+bug/1778011

Therefore he does not rely on the kernel, or bbswitch … it sounds like he has in effect incorporated bbswitch but I haven’t looked at his code which is https://github.com/tseliot/ubuntu-drivers-common/tree/bionic-power-saving

He says: "I added code in gpu-manager (ubuntu-drivers-common) to unload the nvidia modules before the login manager starts (nouveau is already blacklisted), to find the GPU on the PCI bus, and to set the power control to “auto”, so that the device can be put to sleep.

No nouveau, tlp, bbswitch, or anything else is required. I had to patch systemd (which is being SRU’d), so that unloading modules works again."

He has an early-stage PPA, which includes a patched systemd. It works for me, although at this stage is still requires a reboot to change modes (he is pretty sure this can be changed). It is incorporated with the nvidia control panel, and there is no more initramfs building.

It also requires lightdm. I’m pretty sure he won’t regard this as finished until he works out the gdm3 problems.

I will keep the Matthieu Gras approach for the moment but it looks like official ubuntu will have a good solution soon.