I have been using my Titan Xp GPU on a workstation that has Ubuntu 18 installed in it. Until recently the GPU was working with PyTorch but since past few days I am unable to use GPU with PyTorch. After that I tried to run nvidia-smi which returned the follwing error: Unable to determine the device handle for GPU 0000:02:00.0: Unknown Error
I have attached nvidia-bug-report so that someone can help me diagnose the issue with my GPU.
You also have an amd igpu which is active but doesn’t have a driver, I suspect this might be interfering. Please either disable the igpu in bios or upgrade the kernel using the liquorix ppa and install latest firmware https://packages.ubuntu.com/en/lunar/linux-firmware
Sorry, please ignore the message, I’ve posted to the wrong thread.
In your case you’re getting an Xid 79, the gpu is shutting down.
Since also your cpu is complaining about high temperatures, please check/clean cooling and airflow.