GTX 4080: second monitor freezes

I have a MSI laptop with GTX4080 (MSI Vector GP68 HX 12V). I use it with an external monitor (HDMI). It worked fine for almost one month, then started failing a few days ago. The external monitor freezes after some minutes of using it.

I’m on Ubuntu 22.04.3 LTS. I have included full details below. I have checked that I have the latest NVIDIA drivers installed. I attach the requested nvidia-bug-report.log.gz file.

Can you help me diagnose the problem? Currently, I can only use the computer without an external monitor, which makes it kind of unusable for my purposes.

In case it’s relevant, I removed the original RAM chip and added two new ones, to reach a total of 64GB

Thanks.

This is the output of inxi -F:

System:
  Host: user-MSIvector Kernel: 6.2.0-35-generic x86_64 bits: 64
    Desktop: LXQt 0.17.1 Distro: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
Machine:
  Type: Laptop System: Micro-Star product: Vector GP68HX 12VH v: REV:1.0
    serial: <superuser required>
  Mobo: Micro-Star model: MS-15M1 v: REV:1.0 serial: <superuser required>
    UEFI: American Megatrends LLC. v: E15M1IMS.506 date: 04/13/2023
Battery:
  ID-1: BAT1 charge: 88.1 Wh (100.0%) condition: 88.1/87.4 Wh (100.8%)
CPU:
  Info: 16-core (8-mt/8-st) model: 12th Gen Intel Core i9-12900HX bits: 64
    type: MST AMCP cache: L2: 14 MiB
  Speed (MHz): avg: 2358 min/max: 800/4900:5000:3600 cores: 1: 2500 2: 2500
    3: 2500 4: 2500 5: 800 6: 2500 7: 2500 8: 2500 9: 800 10: 2500 11: 2500
    12: 2500 13: 2500 14: 2500 15: 2500 16: 2500 17: 2500 18: 2500 19: 2500
    20: 2500 21: 2500 22: 2500 23: 2500 24: 2500
Graphics:
  Device-1: Intel driver: i915 v: kernel
  Device-2: NVIDIA driver: nvidia v: 535.113.01
  Device-3: Acer HD Camera type: USB driver: uvcvideo
  Display: x11 server: X.Org v: 1.21.1.4 driver: X:
    loaded: modesetting,nvidia unloaded: fbdev,nouveau,vesa gpu: i915
    resolution: 2560x1600~240Hz
  OpenGL: renderer: Mesa Intel UHD Graphics (ADL-S GT1)
    v: 4.6 Mesa 23.0.4-0ubuntu1~22.04.1
Audio:
  Device-1: Intel driver: sof-audio-pci-intel-tgl
  Device-2: NVIDIA driver: snd_hda_intel
  Sound Server-1: ALSA v: k6.2.0-35-generic running: yes                                                                                                                                     
  Sound Server-2: PulseAudio v: 15.99.1 running: yes                                                                                                                                         
  Sound Server-3: PipeWire v: 0.3.48 running: yes                                                                                                                                            
Network:                                                                                                                                                                                     
  Device-1: Intel driver: iwlwifi                                                                                                                                                            
  IF: wlo1 state: down mac: 08:9d:f4:2d:10:4f                                                                                                                                                
  Device-2: Realtek RTL8125 2.5GbE driver: r8169                                                                                                                                             
  IF: enp58s0 state: up speed: 1000 Mbps duplex: full                                                                                                                                        
    mac: 04:7c:16:a7:9f:b0                                                                                                                                                                   
Bluetooth:                                                                                                                                                                                   
  Device-1: Intel type: USB driver: btusb                                                                                                                                                    
  Report: hciconfig ID: hci0 state: up address: 08:9D:F4:2D:10:53
RAID:
  Hardware-1: Intel Volume Management Device NVMe RAID Controller driver: vmd
Drives:
  Local Storage: total: 1.84 TiB used: 833.18 GiB (44.1%)
  ID-1: /dev/nvme0n1 vendor: Micron model: 2400 MTFDKBA1T0QFM
    size: 953.87 GiB
  ID-2: /dev/nvme1n1 vendor: Crucial model: CT1000P3PSSD8 size: 931.51 GiB
  ID-3: /dev/sda type: USB vendor: Kingston model: DataTraveler 2.0
    size: 3.73 GiB
Partition:
  ID-1: / size: 937.53 GiB used: 15.39 GiB (1.6%) fs: ext4
    dev: /dev/nvme0n1p2
  ID-2: /boot/efi size: 299.4 MiB used: 6.1 MiB (2.0%) fs: vfat
    dev: /dev/nvme0n1p1
Swap:
  Alert: No swap data was found.
Sensors:
  System Temperatures: cpu: 52.0 C mobo: N/A
  Fan Speeds (RPM): N/A
Info:
  Processes: 491 Uptime: 2h 30m Memory: 62.49 GiB used: 5.62 GiB (9.0%)
  Shell: Bash inxi: 3.3.13

1 Like

nvidia-bug-report0.log.gz (352.7 KB)

output of nvidia-smi

Mon Oct 23 16:02:42 2023                                                                                                                                                                     
+---------------------------------------------------------------------------------------+                                                                                                    
| NVIDIA-SMI 535.113.01             Driver Version: 535.113.01   CUDA Version: 12.2     |                                                                                                    
|-----------------------------------------+----------------------+----------------------+                                                      
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |                                                      
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |                                                      
|                                         |                      |               MIG M. |                                                      
|=========================================+======================+======================|                                                      
|   0  NVIDIA GeForce RTX 4080 ...    Off | 00000000:01:00.0 Off |                  N/A |                                                      
| N/A   39C    P8               4W / 150W |    413MiB / 12282MiB |      0%      Default |                                                      
|                                         |                      |                  N/A |                                                      
+-----------------------------------------+----------------------+----------------------+                                                      
                                                                                                                                               
+---------------------------------------------------------------------------------------+                                                      
| Processes:                                                                            |                                                      
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |                                                      
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1208      G   /usr/lib/xorg/Xorg                          164MiB |
|    0   N/A  N/A      1844    C+G   ...5961157,16780141722617718517,262144      223MiB |
+---------------------------------------------------------------------------------------+

Please post the output of
sudo cat /sys/module/nvidia_drm/parameters/modeset
should be either ‘Y’ or ‘N’

sure: it returns Y

According to the changelog a bug was fixed in 535.86.05
* Fixed a bug that prevented displays from refreshing when using an
NVIDIA PRIME Display Offload sink.
I suspect this was reintroduced in your current 535.113.01
Can you possibly downgrade the driver to a working one?

Ok I can try that.
Just to be sure I understand correctly, you mean that I should downgrade to 535.86.05, right? Can you point out where to download old drivers?

Rather try another driver that you can just select from “Software&Updates” first.

You mean one of these?

Or one of the other available ones at this website?

The first one, try “metapackage from nvidia-driver-525 (proprietary)”

Same issue:
https://forums.developer.nvidia.com/t/external-monitor-freezes-when-using-dedicated-gpu/265406?u=generix

1 Like

I reverted back to 525 as instructed by ‘generix’.

Maybe a minor detail, but now when I call nvidia-settings I don’t see the information of my hardware as it was shown when I had the latest drivers:

Let’s see, I’ll report back after some flight hours.

Thanks, I’ll try and report back.

Would you advise on following this tutorial? It mentions a ‘method of installing NVIDIA drivers involves using the NVIDIA CUDA repository, which is frequently updated and supports Debian, Ubuntu, RHEL, and other popular Linux distributions.’ Would it make any difference to this case?

Nvidia-settings not showing the gpu details is a bad sign, something went wrong downgrading the driver. Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

nvidia-bug-report.log.gz (138.3 KB)

Due to using the runfile installer you now have a mix of 535 and 525 driver versions, which doesn’t work. Please try this:
In software&updates, switch to the nouveau driver. This should remove all nvidia packages. Then run the runfile installer for 535.113.01 with option -b, immediately afterwards run it again with --uninstall option. This should remove all nvidia driver files. Then use Software&Updates to switch to the 525 driver and reboot. Please create a new nvidia-bug-report.log afterwards.

Thanks. I tried switching to nouveau driver before, and I think I did something wrong because it ended up badly… so this time I want to confirm so I don’t screw it again. This would be the sequence, can you confirm?

  1. In software&updates, switch to the nouveau driver. No rebooting.
  2. run the runfile installer for 535.113.01 with option -b. No rebooting.
  3. run the runfile installer for 535.113.01 with option --uninstall. No rebooting.
  4. use Software&Updates to switch to the 525 driver. Reboot

So I have to reboot only after step 4, right?

Correct.

Thanks for the confirmation. I will try.
In the meanwhile, maybe it is worth mentioning that the problem seems to have gone. I will try your suggestion anyways to have my system in order.

1 Like