"RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable" on Ubuntu 16.04

varun7q6rv · January 24, 2020, 7:58am

I am getting the following error from the sample code:

import torch
torch.zeros((2,2)).to(torch.device("cuda")

However I have 4 GPUs installed all with abundant memory and no running processes.

Thu Jan 23 23:50:29 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 00000000:05:00.0 Off |                  N/A |
| 26%   63C    P0    76W / 250W |      0MiB / 12212MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:06:00.0 Off |                  N/A |
| 39%   51C    P0    58W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:09:00.0 Off |                  N/A |
| 37%   49C    P0    60W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 23%   39C    P0    57W / 250W |      0MiB / 11178MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Moreover all GPUs have compute mode set to Default. So its not a permissions issue.

System Details
Ubuntu 16.04
NVIDIA-SMI 440.33.01
Driver Version: 440.33.01
CUDA Version: 10.2
I get the same issue when compiling C code with nvcc and running something as simple as cudaalloc.

My code does detect GPUs. For example the following code:

#include <stdio.h> 
#include <cuda_runtime_api.h>
#include <cuda.h>

int main() {
  int nDevices;

  cudaGetDeviceCount(&nDevices);
  for (int i = 0; i < nDevices; i++) {
    cudaDeviceProp prop;
    cudaGetDeviceProperties(&prop, i);
    printf("Device Number: %d\n", i);
    printf("  Device name: %s\n", prop.name);
    printf("  Memory Clock Rate (KHz): %d\n",
           prop.memoryClockRate);
    printf("  Memory Bus Width (bits): %d\n",
           prop.memoryBusWidth);
    printf("  Peak Memory Bandwidth (GB/s): %f\n\n",
           2.0*prop.memoryClockRate*(prop.memoryBusWidth/8)/1.0e6);
  }
}

Prints out:

Device Number: 0
  Device name: GeForce GTX 1080 Ti
  Memory Clock Rate (KHz): 5505000
  Memory Bus Width (bits): 352
  Peak Memory Bandwidth (GB/s): 484.440000

Device Number: 1
  Device name: GeForce GTX 1080 Ti
  Memory Clock Rate (KHz): 5505000
  Memory Bus Width (bits): 352
  Peak Memory Bandwidth (GB/s): 484.440000

Device Number: 2
  Device name: GeForce GTX 1080 Ti
  Memory Clock Rate (KHz): 5505000
  Memory Bus Width (bits): 352
  Peak Memory Bandwidth (GB/s): 484.440000

Device Number: 3
  Device name: GeForce GTX TITAN X
  Memory Clock Rate (KHz): 3505000
  Memory Bus Width (bits): 384
  Peak Memory Bandwidth (GB/s): 336.480000

Interesting enough when I run

nvidia-smi -r

I get the error

GPU Reset couldn't run because GPU 00000000:05:00.0 is the primary GPU.

. But if I try to reset any of the remaining 3 GPUs I do not get this error. I tried disabling this GPU and running the code with the remaking GPUs to no luck. Could this be a hardware installation issue? I tried rebooting my machine which did not help either.

Topic		Replies	Views
all CUDA-capable devices are busy or unavailable on "GeForce RTX 2080" on Ubuntu 18.04 CUDA Setup and Installation	3	5337	November 1, 2018
CUDA runtime error: "all CUDA-capable devices are busy or unavailable" CUDA Programming and Performance	2	3464	October 13, 2017
RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable TensorRT cuda , ubuntu , python	1	847	April 28, 2021
Problem running CUDA 3.1 SDK examples: cudaSafeCall() Runtime API error : all CUDA-capable devices a CUDA Programming and Performance	5	50646	August 2, 2010
Memory problems CUDA Programming and Performance	4	3239	February 8, 2012
invalid device function, all CUDA-capable devices are busy or unavailable CUDA Programming and Performance	5	7748	July 6, 2013
cudaGetDeviceCount returned 100 -> no CUDA-capable device is detected CUDA Setup and Installation	0	1320	May 12, 2021
CUDA error: all CUDA-capable devices are busy or unavailable Frameworks cuda	4	858	April 28, 2021
Buying Nvidia Products is a Serious Waste of Money: They Don't Work CUDA Developer Tools	0	439	June 26, 2020
CUDA error: all CUDA-capable devices are busy or unavailable CUDA Setup and Installation pytorch , wsl	5	6924	April 19, 2021

"RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable" on Ubuntu 16.04

Related topics