PRIME problems: after 1min of idle time - GPU is lost. nvidia-435 driver, GTX 1050ti

Hi,
I’ve been stuck on this problem a while now and have no idea what causes it specifically.
Config is:
HP x360-spectre with GTX 1050Ti,
Ubuntu 19.04
kernel : 5.3.0-46-generic
nvidia-driver-435(proprietary) installed through ubuntu-drivers as default

The problem is:
If PRIME is set on nvidia, after a few minutes (1-2) of no activity, screen would freeze and I am unable to <Ctrl-Alt -fx> to another session. This also happens sometimes by random when there is some activity (I notice that it’s usually when the fans stop). When this happens only thing I can do is hard reboot, so I am unable to run bug-report.
If PRIME is set on on-demand, in the same moments as before, nvidia-smi reports that GPU is lost
(Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU). I ran bug-report prior to that (when gpu is present and driver is loaded) and after that (when gpu is lost) and attached it here as log (pre and post) files.
If Prime is set on intel, no such incidents occur.

Same problem occurred on Ubuntu 18.04, previous versions of linux kernel, and 440(open-source) driver.

pre-nvidia-bug-report.log (2.0 MB)
nvidia-bug-report.log (2.4 MB)

I am mainly using nvidia GPU for cuda and tensorflow and I managed to get that working during those brief moments when GPU is present and driver is loaded. However, I am unable to make it work consistently over long periods.

Any advice/help is appriciated.

Please try setting the kernel parameter
intel_idle.max_cstate=1

It appears that’s the solution. Thanks a lot.

Do you mind explaining what caused this problem and how does this solve it?
I read about CPU c states, and I’m now wandering if this will drastically increase power consumption/prevent proper suspend?
Also, I noticed that in log file today that nvidia-drm module is unable to load. Does this have any effect whatsoever?

Thanks again

It’s some bios bug with some HP/Dell notebooks where the cpu powerstates are incorrectly configured so that on idle the pci bus where the nvidia gpu is connected to gets powered down. The parameter circumvents that but should lead only to a slightly higher power consumption on idle.
You can try to optimize it by increasing the value from 1 to 2,3,4… and check at which point the gpu falls off again to find a sweet spot.
The “nvidia-drm module is unable to load” message was from an earlier attempt to install the driver using the .run installer (don’t do that) but the driver was already loaded from repo. Just ignore.