I try to run
nvidia-smi from the shell on a Ubuntu 14.04.4 LTS x64 machine, but it hangs indefinitely: what could be the issue?
Below are some more information is needed:
It used to work but stopped working at some point.
Rebooting doesn’t fix the issue.
I had installed the Nvidia drivers with the following:
wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb # Got the link at https://developer.nvidia.com/cuda-downloads
sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda
- My computer has 4 Titan X GPUs attached to it, I can still see them with
sudo lshw -C display.
I’ve got the same issue on a ubuntu14.04 machine with 4 m60 gpus, cuda installed using cuda_7.5.18_linux.run and cudnn installed with cudnn-7.0-linux-x64-v4.0-prod.tgz which iiuc is compatible with cuda 7.5. (I need cudnn4 and not 5 due to some slightly-stale code).
nvidia-smi which is my main ‘what’s going on with the gpus tool’ hangs indefinitely on this machine (not on any others, which all have 4 k80 gpus.)
Franck, Jery, I noticed both of you have 4 GPUs on your system. Is this one a single-CPU machine? I read somewhere that it may be due to hardware interrupt issues. Something about the first core on a CPU having to handle all hardware interrupts, and it can cause issues if the interrupt handler runs too slow.
So I’m wondering if 4 GPUs is too much for the CPU’s interrupt handler. I’m experiencing the same issue on a dual-Xeon machine with 8 GPUs (4 GPUs per Xeon processor).
I’ve tried forcing persistence to “on”, which sped up nvidia-smi drastically (it used to take 5 seconds to gather all GPU data, but now it’s instant). However, nvidia-smi just hung on me right now, after calling it while 6 GPUs were running.
Are you guys still having this issue? I’m debating removing 2 GPUs so each CPU only has to query 3 GPUs, but that would be disappointing.
I am experiencing the same issue. I too, have 4 GPUs with one CPU. The workstation went well when using 2 GPUs. The hangs happended when using 3 or 4 GPUs.
Except for the “interruption”, I am also guessing the temperature problem. Since 4 reference GPUs generate more heat when there isn’t a water-cooling fan.