I have had the problem of GPU Fan/Power ERR with Nvidia RTX 2070 Super (bought one and half month ago). It has two related phenomenons as follows. I could temporarily solve it but could not permanently solve it. This is a headache issue. Please help me figure it out.
CUDA Driver 450.57
CUDA Toolkit 11.0/cuDNN 8.0.1
Jupyter Notebook 6.0.3
Miniconda 4.8.3 with cudatoolkit 10.1.243 and cudnn 7.6.5 and cupti 10.1.168
1. Start Computer
After the Ubuntu 18.04 system failed to start in the morning( , I restarted the system again and then it shows GPU Fan/Power ERR! upon giving the command of $ nvidia-smi
With regard to the first scenario, I used the following composite commands to make the GPU Fan come back to the normal status temporarily.
$ sudo rmmod nvidia_uvm $ sudo modprobe nvidia_uvm $ sudo reboot
2. Jupyter Notebook
After there was a stayed GPU memory usage amount such as 2100MiB for an ended deep learning project at Jupyter Notebook. If closing the Jupyter Notebook and turn off the system and then restarted the system, there was the GPU Fan Error.
With the second scenario, I used the following method to make it come back to the normal status.
I inserted the following code in the end of the cells in the Jupyter Notebook each time.
from numba import cuda cuda.select_device(0) cuda.close()
The GPU Fan Error seems to be a persistent issue in my system. I want to know whether it is a GPU hardware problem. If it is not a hardware problem, how can I solve the GPU Fan ERR permanently?
I have other Nvidia RTX 2060 GPUs that runs quite good in the environment as same as the above-mentioned.