GeForce RTX 2070 MaxQ heats up when in idle on MSI GS65 8SF with Ubuntu 19.04 and nvidia driver 430....

Hi all,

When the GPU is in idle (should be inactive) e.g. when I select the intel profile with prime-select or when I lock the screen of my laptop, it starts to heat up around 55°C.
Normally, when I use the nvidia profile, it stays around 42°C (45°C if I use an external monitor).

A thing that I have noticed is that the nvidia-smi command cannot see the Fan of the GPU, but the fan works properly.

I’m using the driver 430.40 on ubuntu 19.04.

I can’t figure out how to attach the result of nvidia-bugs-report.sh to this topic so I’ve put them in this shared google drive directory

I was having this problem even with the previous driver (418.54) and I’ve already asked help for this problem on: ask ubuntu, on reddit and on the community forum (in the last I got the wrong GPU. it is a 2070)

Do you know what can I do?

In your logs, you were using the nvidia profile. There’s constantly 1% gpu load, so the gpu doesn’t really get into idle state. Most of the time, the nvidia driver only throttles down after 35-40 seconds, you could try this: https://devtalk.nvidia.com/default/topic/1048768/linux/if-you-have-gpu-clock-boost-problems-please-try-__gl_experimentalperfstrategy-1/
(I didn’t have any success with that.)
Overall, the provided logs aren’t very meaningful, can you provide some that shows your issue a bit clearer, e.g. from the mentioned switch to intel?
Attaching files:
Hovering the mouse over an existing post of yours will reveal a paperclip icon.
https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/

Thanks for your answer,

If I switch to intel profile I can’t use nvidia-smi command because it cannot communicate with the nvidia drivers.

How can I obtain the logs that I need while I’m in the intel profile?

P.S. I’ve tried the solution that you posted, but I did not get any result

While in intel mode, the nvidia gpu should be off, so nvidia-smi doesn’t work but in your OP you mentioned the nvidia gpu reaching 55°C in that state. How did you measure it?

I see that the nvidia GPU is on because the hardware led is red (In the MSI laptop if the nvidia GPU is in use the on/off led is red otherwise is blue).

When I listen the fan of the GPU,I wait some minutes and I switch on the nvidia profile logout and login and run nvidia-smi.

The temperature that I’ve measured is not exactly the maximum that the GPU reached, but it gives a realistic idea of that.

It is the method that used to get the log that I’ve posted.

Nothing of that is visible in the logs. Both logs show that you booted up in nvidia mode without any switches between gpus. Maybe you attached the wrong logs?
before-idle was created at 20:14h, after-idle at 20:33h, 19 minutes later.

Maybe you’re right, so I did it again with the following procedure:

  1. Login and use firefox for a little
  2. Run nvidia-bug-report.sh
  3. Switch to intel profile
  4. Logout and login
  5. Use normally firefox for a little
  6. Run nvidia-bug-report.sh
  7. Switch to nvidia profile
  8. Logout and login
  9. Run nvidia-bug-report.sh

I’m going to attach the three logs that I’ve obtained.
nvidia-bug-report-while-intel-profile.log.gz (130 KB)
nvidia-bug-report-with-nvidia-profile-second.log.gz (1.12 MB)
nvidia-bug-report-with-nvidia-profile-first.log.gz (1.14 MB)

Ok, the intel log shows the problem. When switching to intel mode, the nvidia driver gets unloaded but the gpu is not turned off and in that case it consumes more power than when being used and idle. Might be a bug in Ubuntu’s nvidia-prime package or gpu-manager. Please switch to intel mode, then reboot and create a new nvidia-bug-report.log while still running on intel so I can see if that’s a general problem or just on logout/login.

Hi I’ve done this:

  1. Switch to intel
  2. Reboot
  3. Run nvidia-bug-report.sh
  4. Normally use the pc on web
  5. Run nvidia-bug-report.sh again

I’m going to post the first and the second logs
nvidia-bug-report-intel-after-reboot-first.log.gz (126 KB)
nvidia-bug-report-intel-after-reboot-second.log.gz (128 KB)

I just checked and 19.04 uses runtime suspend instead of bbswitch again. This was problematic before:
https://bugs.launchpad.net/ubuntu/+source/nvidia-prime/+bug/1778011
While in intel mode, what’s the output of
cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
cat /sys/bus/pci/devices/0000:01:00.0/power/control

Why do you have the kernel parameters acpi_osi=! acpi_osi=“Windows 2009”, those should only be set on specific problems.
To debug this further, please install and use powertop. When you pull the plug it will report the power usage from battery.
Please check:

  1. removing the acpi_osi parameters, on intel mode. Note down powertop value for power draw.

2 install bbswitch, have intel mode enabled, then run
sudo tee /proc/acpi/bbswitch <<<OFF
check if it’s really off:
cat /proc/acpi/bbswitch
Note down powertop value for power draw again.

Hi,

The result of the cat commands is respectively: active and auto.

I’have the kernel parameters acpi_osi=! acpi_osi=“Windows 2009” because of a problem of the computer, if I don’t set them the airplane mode hardware button does not work causing a lot of bugs.

I don’t know what do you mean with “Note down powertop value for power draw.” so I made a screenshot for each screen of powertop so you can see what you want (at least I hope that).

I’ve installed bbswitch with the command “apt install bbswitch-kms” but the file /proc/acpi/bbswitch does not exists even after a reboot.

The only thing that I see with powertop is that some pci device marked as NVIDIA corporation are used at 100%

The value I meant can be seen on your first picture
“The battery reports a discharge rate of 51.4W”
Did you really disconnect the power adapter? 51Watts is horrible.
To get bbswitch working, you have to run
sudo modprobe bbswitch
The pci runpm values:
“auto” is correct
“active” is wrong, should be “suspended”

Hi,

Yes I’m pretty sure that the power adapter was disconnected. I double checked the image and in the top right corner the symbol of the battery is different when the laptop is connected to power.

I’m now in the intel mode with bbswitch and I’ve done what you said. The Nvidia GPU is off (I can see it with the led) and the value of discharge rate from powertop is around 10 W.

The value of the runtime_status file is still “active”.

So you now have a value that should be reached on proper function.
The runtime_status value when bbswitch is used is not relevant, it’s a different method of turning off the nvidia gpu.
Please check the values for Watts and runtime_status when removing the acpi_osi parameters, then booting to intel mode and waiting for some time after logging in.

The result is the same that I get with the parameters: the runtime_status is active and the discharge rate is around 47.0 W.

The fans are spinning and the led indicates that the nvidia GPU is active.

I’ve waited 15 minutes with this configuration.

Now that’s bad.
Maybe this is kept from working by the subdevices, please check this:
http://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/dynamicpowermanagement.html
Follow the “Known Issues And Workarounds” part.

Hi all,

I have found a workaround solution for the problem:
Installing ubuntu 18.04 LTS and installing the package bumblebee-nvidia when I switch on prime-select intel it uses the bbswitch software and disables the nvidia gpu.

I’ve obtained a discharge rate around 15w.

You should report that to ubuntu, in the mentioned bug report.