Error selecting compatible GPU all CUDA-capable devices are busy or unavailable

jagga13 · May 13, 2015, 6:08pm

Hi All,

I am new to nvidia and GPU computing. We have a small cluster with some K20X nvidia gpu’s that we run amber jobs on. From time to time in slurm we start failing a bunch of jobs that land on a rouge GPU with the the following error “Error selecting compatible GPU all CUDA-capable devices are busy or unavailable”. We have to drain the GPU and reboot or reset the GPU to try and revive it. Our GPU’s are set in a Exclusive Process mode. Is there a way for me to find out why these GPU’s become unavailable? I don’t see other processes using the GPU when this issue happens. Would like to try and figure out what is causing this problem and how I can fix it without blindly just rebooting the node and hoping it gets resolved.

Thanks for your help with this.
-J

Robert_Crovella · May 14, 2015, 1:58am

Try running nvidia-smi on the affected node to see if processes are attached to those GPUs.

You can also use nvidia-smi to try to reset the GPU so that you don’t have to reboot the node. Use nvidia-smi --help to learn about the available command line options.

It’s a little puzzling that you say you have to “drain the GPU”. If your GPUs are set in exclusive process mode, then as soon as a process begins to use a GPU, no other processes can use it. Therefore it’s not surprising that you would have to drain the GPU in order to make it usable by other processes.

You may also want to use some variant of ps -ef in order to use ordinary linux tools to look for rogue, zombie, or other errant processes.

zhanghang0704 · May 11, 2016, 6:28pm

Hi,

Have you solved this problem? It happened several times to me. I have to reinstall the OS to fix it. I just wanted to know whether I have a better way? Thanks a lot!

By the way, I am using Win10.

Thanks,
Hang

Topic		Replies	Views
all CUDA-capable devices are busy or unavailable on "GeForce RTX 2080" on Ubuntu 18.04 CUDA Setup and Installation	3	5431	November 1, 2018
CUDA unavailable in RedHat without other GPU issues CUDA Programming and Performance	4	606	May 26, 2021
No cuda-capable device is available? Strange problem CUDA Programming and Performance	5	2203	June 15, 2010
Reset dedicated GPU after it gets stuck Linux cuda , linux , nvidia-smi	7	22750	August 30, 2023
why "all CUDA-capable devices are busy or unavailable" ? CUDA Programming and Performance	34	64983	April 20, 2011
"RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable" on Ubuntu 16.04 CUDA Programming and Performance	0	1188	January 24, 2020
CUDA error: all CUDA-capable devices are busy or unavailable Frameworks (archived) cuda	4	975	April 28, 2021
CUDA runtime error: "all CUDA-capable devices are busy or unavailable" CUDA Programming and Performance	2	3555	October 13, 2017
11 GB of GPU RAM used, and no process listed by nvidia-smi CUDA Programming and Performance	17	151362	September 22, 2023
vGPU guests fail after restart with all CUDA-capable devices are busy or unavailable More vGPU Forums	3	1055	April 12, 2021

Error selecting compatible GPU all CUDA-capable devices are busy or unavailable

Related topics