I have a setup with a dedicated NVIDIA for CUDA computing and another NVIDIA for display. From time to time, the CUDA dedicated NVIDIA gets stuck - the fan keeps running at 100% and nothing can be really done with it anymore. I can “solve” it by rebooting but I’d much prefer to solve it by resetti…

Reset dedicated GPU after it gets stuck

generix March 17, 2022, 8:35am 3

Please check if this helps:
Check whether

sudo cat /sys/module/nvidia_drm/parameters/modeset

Returns ‘Y’, if so, run
run

grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*

to find a file containing

options nvidia-drm modeset=1

and change 1 to 0
then run

sudo update-initramfs -u

and reboot.

sudo cat /sys/module/nvidia_drm/parameters/modeset

should return ‘N’ if done right.
Furthermore, you should monitor gpu temperatures and correctly set up nvidia-persistenced in order to prevent running into the error state.

2 Likes

Dedicate one Nvidia card for CUDA and another Nvidia card for Xorg?

Topic		Replies	Views
Dedicate one Nvidia card for CUDA and another Nvidia card for Xorg? Linux	8	1513	March 18, 2022
One of two 1080Ti GPUs not detected after CUDA failure CUDA Setup and Installation	7	1414	April 27, 2018
Nvidia driver/CUDA installation causes centos 7 to hang on boot. unable to access user interface. CUDA Setup and Installation	29	28913	February 10, 2018
[SOLVED] Run CUDA on dedicated NVIDIA GPU while connecting monitors to Intel HD graphics, is this possible? CUDA Setup and Installation	15	71897	December 9, 2018
After installing CUDA 9.0 in POWER9(RHEL7), nvidia-smi shows Unknown Error in Memory_Usage column. CUDA Setup and Installation	18	3135	June 8, 2018
Cannot nvidia-smi Geforce 1070 anymore suddenly. Linux	9	1641	October 12, 2021
Deciphering an NVRM: Xid message? CUDA Programming and Performance	27	78092	April 1, 2012
Nvidia-persistenced: Failed to query NVIDIA devices Application Accelerator Software cuda , kernel , ubuntu	8	10788	August 18, 2023
Two GPUs, but 2nd GPU not detected. How to fix? CUDA Setup and Installation	10	15602	January 21, 2018
Nvidia-smi recognize H100 when Firmware is disable Confidential Computing cuda , ubuntu	10	562	September 11, 2024

Reset dedicated GPU after it gets stuck

Related topics