Unable to determine the device handle for GPU 0000:01:00.0: Not Found on RTX 4070Ti

I’m running an RTX 4070Ti on Ubuntu 24.04 with the 6.8.0-52 kernel and 550 nvidia drivers. I connect remotely to the computer through ssh and it doesn’t have a monitor connected (if that’s relevant). My gpu keeps failing. When I run nvidia-smi I get:

Unable to determine the device handle for GPU0000:01:00.0: Unknown Error

After I restart the computer it works normally but after a while it just stops working again, has anyone encountered anything similar and what can I do to fix this? Here are some relevant commands that I ran and their output:

>lsmod | grep nvidia

 nvidia_drm            122880  2
 nvidia_modeset       1355776  3 nvidia_drm
 nvidia              54386688  30 nvidia_modeset
 video                  73728  2 amdgpu,nvidia_modeset

>dmesg | grep -i nvidia

[    4.613640] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input8
[    4.613876] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input9
[    4.614315] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input10
[    4.615153] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input11
[    4.655236] nvidia: loading out-of-tree module taints kernel.
[    4.655242] nvidia: module license 'NVIDIA' taints kernel.
[    4.655245] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    4.655246] nvidia: module license taints kernel.
[    5.623795] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
[    5.624974] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    5.675412] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  550.144.03  Mon Dec 30 17:44:08 UTC 2024
[    5.684118] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  550.144.03  Mon Dec 30 17:10:10 UTC 2024
[    5.686076] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    6.440172] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[    6.452495] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[    6.470561] nvidia-uvm: Loaded the UVM driver, major device number 508.
       NVRM: nvidia-bug-report.sh as root to collect this data before
       NVRM: the NVIDIA kernel module is unloaded.
[316109.267216] nvidia-uvm: Unloaded the UVM driver.


> dmesg | grep -i pci
 ...
 [89212.109319] NVRM: GPU at PCI:0000:01:00: GPU-fe5c340e-4c73-2c72-9782-5bd0fbdd56cf