nvidia-smi hangs indefinitely: what could be the issue?

Franck_Dernoncourt · August 9, 2016, 4:43pm

I try to run nvidia-smi from the shell on a Ubuntu 14.04.4 LTS x64 machine, but it hangs indefinitely: what could be the issue?

Below are some more information is needed:

It used to work but stopped working at some point.
Rebooting doesn’t fix the issue.
I had installed the Nvidia drivers with the following:

Install Nvidia drivers, CUDA and CUDA toolkit, following some instructions from Installation Guide Linux :: CUDA Toolkit Documentation

wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb # Got the link at CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer
sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda

My computer has 4 Titan X GPUs attached to it, I can still see them with sudo lshw -C display.

jeremyrutman · September 4, 2016, 5:37pm

I’ve got the same issue on a ubuntu14.04 machine with 4 m60 gpus, cuda installed using cuda_7.5.18_linux.run and cudnn installed with cudnn-7.0-linux-x64-v4.0-prod.tgz which iiuc is compatible with cuda 7.5. (I need cudnn4 and not 5 due to some slightly-stale code).

nvidia-smi which is my main ‘what’s going on with the gpus tool’ hangs indefinitely on this machine (not on any others, which all have 4 k80 gpus.)

jeremyrutman · May 2, 2017, 7:02pm

anyone? bueller?

jeremyrutman · May 16, 2017, 8:33pm

if lspci is not returning an answer this https://askubuntu.com/questions/909991/lspci-returns-cannot-open-sys-bus-pci-devices-xxxxx-resource-no-such-file-or may do the trick

backup your vitals, then

apt-get remove linux-image-4.4.0-75-generic  
update-grub

this driver update http://stackoverflow.com/questions/41489070/nvidia-smi-process-hangs-and-cant-be-killed-with-sigkill-eithermay also help

tinkerthinker · August 23, 2017, 6:57pm

Franck, Jery, I noticed both of you have 4 GPUs on your system. Is this one a single-CPU machine? I read somewhere that it may be due to hardware interrupt issues. Something about the first core on a CPU having to handle all hardware interrupts, and it can cause issues if the interrupt handler runs too slow.

So I’m wondering if 4 GPUs is too much for the CPU’s interrupt handler. I’m experiencing the same issue on a dual-Xeon machine with 8 GPUs (4 GPUs per Xeon processor).

I’ve tried forcing persistence to “on”, which sped up nvidia-smi drastically (it used to take 5 seconds to gather all GPU data, but now it’s instant). However, nvidia-smi just hung on me right now, after calling it while 6 GPUs were running.

Are you guys still having this issue? I’m debating removing 2 GPUs so each CPU only has to query 3 GPUs, but that would be disappointing.

Freeman_G · June 9, 2019, 6:37am

Hi! tinkerthinker.

I am experiencing the same issue. I too, have 4 GPUs with one CPU. The workstation went well when using 2 GPUs. The hangs happended when using 3 or 4 GPUs.

Except for the “interruption”, I am also guessing the temperature problem. Since 4 reference GPUs generate more heat when there isn’t a water-cooling fan.

Topic		Replies	Views
nvidia-smi hangs. cannot be killed even by SIGKILL CUDA Setup and Installation	1	10348	April 5, 2016
Nvidia-smi hang Linux nvidia-smi	0	180	September 12, 2024
nvidia-smi is slow on Ubuntu 16.04 CUDA Setup and Installation	4	15324	August 23, 2017
nvidia-smi is slow and hangs after sometime with 1080Ti CUDA Setup and Installation	4	6747	June 20, 2018
Nvidia driver hangs the CPU for some time when booting. Motherboard: G1 Sniper Z87, Processor: i7-.... Linux	6	2792	October 14, 2021
Four Titan X superclocked crashes with latest driver Linux	3	652	June 29, 2016
Nvidia-smi really slow to execute Linux ubuntu	4	11941	August 11, 2024
Nvidia driver: Infinite login loop, Ubuntu 18.04 Linux	1	7362	March 30, 2021
Nvidia drivers removed suddenly CUDA Setup and Installation	4	2593	April 3, 2018
NVIDIA-SMI no longer works and fresh nvidia-driver installs fail CUDA Setup and Installation cuda , ubuntu	1	1767	January 16, 2024

nvidia-smi hangs indefinitely: what could be the issue?

Install Nvidia drivers, CUDA and CUDA toolkit, following some instructions from Installation Guide Linux :: CUDA Toolkit Documentation

Related topics