Hard Crashing and PRIME errors with GeForce GTX 1650 in freshly installed Ubuntu 20.04

Computer: Dell XPS 15 7590
OS: Ubuntu 20.04
Kernel: 5.4.0-42-generic
CPU: Intel Core i7-9750H
Graphics: NVIDIA TU117M (GeForce GTX 1650 Mobile / Max-Q)
Drivers: nvidia-driver-450

I’ve been experiencing hard crashing with my NVIDIA GPU on my Ubuntu system. After 10-30 minutes of running almost any game, my system freezes entirely (monitor, mouse, etc.), and I’m forced to reboot using the power button. Smaller, 2D games don’t cause crashing, although I can’t say if this is due to their lower resources requirements or their lack of 3D graphics. In addition to what is said in my system logs, I know this issue lies with NVIDIA because I do not encounter it when using Nouveau drivers or my Intel integrated GPU.

I recently triggered the crash and generated a bug report (attached below). Due to the freezing nature of the crash, I had to run nvidia-bug-report.sh after rebooting my system (the timestamps still look correct). Let me know if I should generate this report a different way.

Additionally, when running nvidia-settings to use NVIDIA Prime, I receive this output before the window opens:

(nvidia-settings:9048): GLib-GObject-CRITICAL **: 14:41:55.734: g_object_unref: assertion ‘G_IS_OBJECT (object)’ failed
** Message: 14:41:55.882: PRIME: Requires offloading
** Message: 14:41:55.882: PRIME: is it supported? yes
** Message: 14:41:55.904: PRIME: Usage: /usr/bin/prime-select nvidia|intel|on-demand|query
** Message: 14:41:55.904: PRIME: on-demand mode: “1”
** Message: 14:41:55.904: PRIME: is “on-demand” mode supported? Yes

Using the older 440 and 435 drivers doesn’t fix this issue. Even freshly installing Ubuntu on my system doesn’t change anything. I don’t think it’s a result of hardware, because my computer is only 1 year old, and it wasn’t experiencing any problems earlier this summer (before I switched from Windows).

I appreciate your help. Please let me know if I can provide any more information.

nvidia-bug-report.log.gz (353.1 KB)

1 Like

Same issue here with the exact same hardware.

Tried the 440.100 and 450.66 driver version with the exact same result: system crashes randomly which can happen in a couple of minutes or in an hour of active use with a game.

ubuntu 20.04
kernel version: 5.7.8-05078

nvidia-bug-report.log.gz (456.5 KB)

After trying multiple things the only thing that seemed to help or even fix the issue is using nvidia-smi to lock the max gpu clock speed. This trick seems to work for any Turing card. The initial issue seemed to be caused by the fact that the command nvidia-smi -i 0 -q seems to always show wrong Max Clocks for the card but the NVIDIA team would need to further investigate what is causing this exact issue.

So for the GTX 1650 Mobile / Max-Q the given core speed is 1020 - 1245(boosted).

The commands to use are the following:

// enable persistent mode
nvidia-smi -pm 1
// lock clock speed range <Min,Max>
nvidia-smi -lgc 300,1245

Used driver version: 440.100

I will keep this post updated in the following days if I still encounter any crash while using the card intensively. I already could try out the card on demanding games for more than 2h30 straight after using the commands mentioned above.

I reinstalled Ubuntu and am now using the 440.100 drivers. I ran the commands described by Jilthe, and they seemed to similarly “help or even fix” this issue for me. I have not experienced a crash while running a game with these commands in effect, and my GPU temperature is generally lower and more stable.

However, this improvement is lost whenever I restart my system. When rebooted, my computer crashes within 10-30 min as usual when running a game. It isn’t until I run Jilthe’s commands again that the crashing is prevented. Perhaps the “Max Clocks” are overwritten at some point to their original, incorrect values?

For me, one side effect of these commands seem to be a “stuttering” of my mouse input. Frequently, there are brief moments where the movement and clicking of my Logitech mouse are not registered. This only occurs when the commands are in effect (i.e. NOT right after a reboot).

Jilthe, thank you for these commands! I’m curious if you experience the loss of improvement upon restart or any mouse input issues.

Side Note: None of the nvidia-settings errors have changed or disappeared. I don’t know if these issues are related, but the GLib-GObject-CRITICAL error in particular seems problematic.

Happy to learn that it fixed it for you, I also didn’t encounter any crashes since I applied those commands.
If you want to enable this fix automatically on every boot you can do the following:

  1. Create a file named nvidia-lock-max-clock.service into /etc/systemd/system for exemple via:

sudo gedit /etc/systemd/system/nvidia-lock-max-clock.service

  1. Copy paste the following content into it (if needed change the arguments for -lgc to your specific card):
Description=Fixes crashes on intensive use of the nvidia card

ExecStart=/usr/bin/nvidia-smi -pm 1
ExecStart=/usr/bin/nvidia-smi -lgc 300,1245

  1. Check if everything works well by runing:

sudo systemctl start nvidia-lock-max-clock

  1. If no errors show up then you can enable the service with:

sudo systemctl enable nvidia-lock-max-clock