GeForce MX330 overheating on an NVIDIA Optimus-enabled laptop even under mild use

Hi folks,

I have a Dell Inspiron 5401 with an i7 processor with integrated graphics and an MX330 dedicated GPU, which can be switched between using NVIDIA Optimus. I’ve been using the official NVIDIA drivers - indeed, I’ve tried 440, 450 and 455 - with PRIME, as is recommended, both on Manjaro Linux (based on Arch) and also on Ubuntu.

Trouble is, whenever I put any even mildly heavy load on the GPU, it overheats and shuts down. I’ve been trying the Unigine Superposition benchmark, which works perfectly on Windows on the laptop, and on Linux using the integrated graphics - but when using Linux with the dedicated graphics, either using PRIME render offloading or using the GPU for everything as is the Ubuntu default, it consistently overheats and dies somewhere around scene 14-16 out of the 17 scenes. This happens after the temperature skyrockets well into the 80-90 degree range - for context, on Windows, the GPU never goes over 70 when running Superposition all the way through.

To be clear, this is not just a Superposition issue, either: the system shuts itself down due to high temperature even when running something as simple as Magic The Gathering: Arena (a card game!) on low graphics.

My inxi output for the system information is below:

[curtispf@curtis-laptop ~]$ inxi -Fazy
System:
  Kernel: 5.7.19-2-MANJARO x86_64 bits: 64 compiler: gcc v: 10.2.0 
  parameters: BOOT_IMAGE=/vmlinuz-5.7-x86_64 
  root=UUID=1a7a0fbf-7510-4b18-bb85-67e34e268569 rw mem_sleep_default=deep 
  quiet 
  cryptdevice=UUID=21ff733c-9741-4616-b5d9-d41496f34322:luks-21ff733c-9741-4616-b5d9-d41496f34322 
  root=/dev/mapper/luks-21ff733c-9741-4616-b5d9-d41496f34322 apparmor=1 
  security=apparmor 
  resume=/dev/mapper/luks-cf849d49-e236-4224-a49e-608c47e9387d 
  udev.log_priority=3 
  Desktop: KDE Plasma 5.20.4 tk: Qt 5.15.2 wm: kwin_x11 dm: SDDM 
  Distro: Manjaro Linux 
Machine:
  Type: Laptop System: Dell product: Inspiron 14 5401 v: N/A serial: <filter> 
  Chassis: type: 10 serial: <filter> 
  Mobo: Dell model: 03GNVW v: A00 serial: <filter> UEFI: Dell v: 1.4.4 
  date: 09/15/2020 
Battery:
  ID-1: BAT0 charge: 49.5 Wh condition: 49.5/53.0 Wh (93%) volts: 17.1/15.0 
  model: BYD DELL TXD0307 type: Unknown serial: <filter> status: Full 
CPU:
  Info: Quad Core model: Intel Core i7-1065G7 bits: 64 type: MT MCP 
  arch: Ice Lake family: 6 model-id: 7E (126) stepping: 5 microcode: A0 
  L2 cache: 8192 KiB 
  flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx 
  bogomips: 23968 
  Speed: 2729 MHz min/max: 400/3900 MHz Core speeds (MHz): 1: 2729 2: 1588 
  3: 1691 4: 1494 5: 2695 6: 2714 7: 2161 8: 2495 
  Vulnerabilities: Type: itlb_multihit status: KVM: VMX disabled 
  Type: l1tf status: Not affected 
  Type: mds status: Not affected 
  Type: meltdown status: Not affected 
  Type: spec_store_bypass 
  mitigation: Speculative Store Bypass disabled via prctl and seccomp 
  Type: spectre_v1 
  mitigation: usercopy/swapgs barriers and __user pointer sanitization 
  Type: spectre_v2 mitigation: Enhanced IBRS, IBPB: conditional, RSB filling 
  Type: srbds status: Not affected 
  Type: tsx_async_abort status: Not affected 
Graphics:
  Device-1: Intel Iris Plus Graphics G7 vendor: Dell driver: i915 v: kernel 
  bus ID: 00:02.0 chip ID: 8086:8a52 
  Device-2: NVIDIA GP108M [GeForce MX330] vendor: Dell driver: nvidia 
  v: 455.45.01 alternate: nouveau,nvidia_drm bus ID: 01:00.0 
  chip ID: 10de:1d16 
  Device-3: Realtek Integrated_Webcam_HD type: USB driver: uvcvideo 
  bus ID: 3-6:5 chip ID: 0bda:565a serial: <filter> 
  Display: x11 server: X.Org 1.20.10 compositor: kwin_x11 
  driver: modesetting,nvidia alternate: fbdev,intel,nouveau,nv,vesa 
  display ID: :0 screens: 1 
  Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.0x11.2") 
  s-diag: 582mm (22.9") 
  Monitor-1: eDP-1 res: 1920x1080 hz: 60 dpi: 158 size: 309x174mm (12.2x6.9") 
  diag: 355mm (14") 
  OpenGL: renderer: Mesa Intel Iris Plus Graphics (ICL GT2) v: 4.6 Mesa 20.2.3 
  direct render: Yes 
Audio:
  Device-1: Intel Smart Sound Audio vendor: Dell driver: snd_hda_intel 
  v: kernel alternate: snd_sof_pci bus ID: 00:1f.3 chip ID: 8086:34c8 
  Sound Server: ALSA v: k5.7.19-2-MANJARO 
Network:
  Device-1: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter 
  vendor: Dell driver: ath10k_pci v: kernel port: 3000 bus ID: 02:00.0 
  chip ID: 168c:003e 
  IF: wlp2s0 state: up mac: <filter> 
  Device-2: Qualcomm Atheros type: USB driver: btusb bus ID: 3-10:6 
  chip ID: 0cf3:e007 
Drives:
  Local Storage: total: 476.94 GiB used: 141.55 GiB (29.7%) 
  SMART Message: Unable to run smartctl. Root privileges required. 
  ID-1: /dev/nvme0n1 vendor: Toshiba model: KBG40ZNS512G NVMe KIOXIA 512GB 
  size: 476.94 GiB block size: physical: 512 B logical: 512 B speed: 31.6 Gb/s 
  lanes: 4 serial: <filter> rev: 10410104 scheme: GPT 
Partition:
  ID-1: / raw size: 467.84 GiB size: 459.50 GiB (98.22%) 
  used: 140.34 GiB (30.5%) fs: ext4 dev: /dev/dm-0 
  ID-2: /boot raw size: 300.0 MiB size: 299.4 MiB (99.80%) 
  used: 147.9 MiB (49.4%) fs: vfat dev: /dev/nvme0n1p1 
Swap:
  Kernel: swappiness: 60 (default) cache pressure: 100 (default) 
  ID-1: swap-1 type: partition size: 8.80 GiB used: 1.07 GiB (12.2%) 
  priority: -2 dev: /dev/dm-1 
Sensors:
  System Temperatures: cpu: 54.0 C mobo: N/A 
  Fan Speeds (RPM): cpu: 0 
Info:
  Processes: 294 Uptime: 15h 11m Memory: 7.55 GiB used: 4.79 GiB (63.4%) 
  Init: systemd v: 246 Compilers: gcc: 10.2.0 clang: 11.0.0 Packages: 
  pacman: 1703 lib: 450 flatpak: 0 Shell: Bash v: 5.0.18 running in: konsole 
  inxi: 3.1.08 

I logged nvidia-smi every two seconds during two of the thermal shutdown events - under driver versions 450 and 455. Those are available here and here.

My nvidia-bug-report.log.gz is here (222.2 KB).

It’d be fantastic to try and work out why this is happening, and find a resolution for it - as at the moment I’m a bit stuck, to be honest! More than happy to provide any further information I can to try and resolve this.

Cheers,

Curtis

The nvidia gpu is at 50°C at P8, idle so there’s somthing wrong with the cooling.
I couldn’t find anything on the web if this notebook is a single- or dual-fan design when equipped with a MX330, all disassembly was only done on models without.
An important sidenote: on notebooks, the nvidia driver is not responsible for any fans, all fans are handled by the system bios or special software. Ẃith dual-fan designs, there’s often a vendor software for Windows driving the second fan.
Please check the ‘sensors’ output if the cpu temperature is at the same level as the gpu temperature which might be hinting towards a single-fan design. Also check the RPM of the fan.
In general, you’ll have to find out first if your specific notebook is single- or dual-fan and then find the correct software to control them (e.g. there is a daemon for dell notebooks).

Thanks for the rapid reply! The notebook is a single-fan design. I’ve tried both thermald configured using dptfxtract, as well as i8kmon, and neither have improved matters, unfortunately. That being said, both of those are then into Intel-land, rather than NVIDIA - am I barking up the wrong tree here?

Yes, like said, the nvidia driver isn’t responsible for cooling/overheating on notebooks.
A single fan design is at least more easy to handle. Please check ‘sensors’ for temp/rpm in different load situations and compare them with the same when running Windows. Try setting the fan to 100% rpm and check if the gpu/cpu stays within correct temperature thresholds. Maybe you can then come up with a manual fan curve to use with i8kmon.
Since the initial fan curve is set and in this case borked within system bios, checking for a bios update might be helpful.
Looks like the model has in general fan control/cooling problems with linux:
https://www.dell.com/community/Latitude/Latitude-5401-fan-noise/td-p/7552764

Hrm, thanks. The linked model is a Latitude, rather than an Inspiron, but it’s probably the same sort of thing - I’ll give some of that a try.

Most of the time immediately before the overheats the fan is doing somewhere around 5000 RPM, so it’s definitely heading up there - which is far more than the equivalent on Windows from what I saw doing the same benchmark there. I ran a BIOS update yesterday to no avail, sadly. I’ll try setting the fan to 100% manually and then running the benchmark, though.

Might also be that Dell software is setting different/limited power profiles for cpu/gpu on Windows to stay within cooling capabilities. That would be a much more difficult situation to solve.

Just tested this - manually setting the fan to 100% (i8kctl state 2) and leaving it there by disabling the BIOS’ automatic fan control still led to an overheat bad enough to cause a shutdown. So, there must be something else going on…

Dell markets this laptop as supporting Ubuntu Linux, but doesn’t provide anything extra for it. I’m even running the thermald profile extracted from dptfxtract at the time there, and still nothing :(

Looking at the specs, the MX330 should support 1594MHz, in your system it’s running at 1911MHz max, don’t know what’s the clocking in Windows. Maybe try limiting the clocks running
sudo nvidia-smi --lock-gpu-clocks=139,1594
Though I don’t think this will work on mobile gpus.