Unable to determine the device handle for GPU0000:61:00.0: Unknown Error

My rig keeps having issues with a particular video card not being recognized when running calculations, and I’ve read a lot of posts in the forums that seem to be temperature or power supply related.


Here’s the driver log I printed, the error serial number is 79, it seems to still be temperature or power supply


But before this device was lost, its problems did not reach the threshold for forced shutdown.
Current system is: ubuntu 22.04
nvidia version: 550.90
cuda : 12.40
Graphics card model: 4090 * 4

The test has persistent mode enabled, but still the same error.