Buying Nvidia Products is a Serious Waste of Money: They Don't Work

alex1 · June 26, 2020, 5:34pm

I have a clean installation of CUDA-10.1 on an Ubuntu 18.04 LTS, headless server. This is the output of nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.95.01    Driver Version: 440.95.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40m          Off  | 00000000:82:00.0 Off |                    0 |
| N/A   28C    P0    61W / 235W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K40m          Off  | 00000000:C2:00.0 Off |                    0 |
| N/A   30C    P0    63W / 235W |      0MiB / 11441MiB |     41%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I compile the following simple program:

#include <cuda_runtime.h>

int main(void)
{
    cudaError_t err = cudaSuccess;

    float *buf = NULL;
    err = cudaMalloc((void **)&buf, 1000);

    if (err != cudaSuccess) {
        fprintf(stderr, "Failed to allocate device memory (error code %s)!\n", cudaGetErrorString(err));

        return EXIT_FAILURE;
    }

    err = cudaFree(buf);

    if (err != cudaSuccess) {
        fprintf(stderr, "Failed to free device memory (error code %s)!\n", cudaGetErrorString(err));

        return EXIT_FAILURE;
    }

    return EXIT_FAILURE;
}

Then I run it manually several times and it fails intermittently:

***@***:~/cuda$ ./test_cuda_malloc
***@***:~/cuda$ ./test_cuda_malloc Failed to allocate device memory (error code all CUDA-capable devices are busy or unavailable)!
***@***:~/cuda$ ./test_cuda_malloc Failed to allocate device memory (error code all CUDA-capable devices are busy or unavailable)!
***@***:~/cuda$ ./test_cuda_malloc Failed to allocate device memory (error code all CUDA-capable devices are busy or unavailable)!
***@***:~/cuda$ ./test_cuda_malloc
***@***:~/cuda$ ./test_cuda_malloc Failed to allocate device memory (error code all CUDA-capable devices are busy or unavailable)!
***@***:~/cuda$

I checked a lot of output of nvidia-smi , /var/log/kern , and /var/log/syslog and there is nothing to help me track down the problem. Disabling one of the cards didn’t help. The host is not used during these experiments and there is nothing that uses the NVidias.

What are steps to troubleshoot this intermittent failure?

Topic		Replies	Views
CUDA NOT WORKING CUDA Setup and Installation	1	46	March 13, 2025
Ubuntu 16.4 cuda 10.1 GV100GL fail Compute Sanitizer cuda , ubuntu	2	824	September 27, 2021
Nvidia-cuda-dev install dependency issues (fresh install ubuntu server 20.04/cuda10.2) CUDA Developer Tools	0	601	June 4, 2020
CUDA 10.1 is not being installed on Ubuntu 19.04 CUDA Setup and Installation	1	1417	May 22, 2019
CUDA 10.0 - no CUDA-capable device is detected, nvidia-smi does not work. CUDA Setup and Installation	0	2403	April 24, 2019
Instaling cuda 12.5 i have 12.3 CUDA Setup and Installation	2	592	June 20, 2024
Incomplete installation CUDA 10.1 - Ubuntu 18.04 CUDA Setup and Installation	1	7237	May 14, 2019
CUDA install removes Nvidia driver in ubuntu 18.04 CUDA Setup and Installation	0	1883	September 22, 2020
CUDA driver version is insufficient for CUDA runtime version CUDA Setup and Installation	1	1349	January 25, 2018
Failed call to cuInit CUDA_ERROR_NOT_INITIALIZED (Device mapping: no known devices) CUDA Setup and Installation	7	6439	November 27, 2018

Buying Nvidia Products is a Serious Waste of Money: They Don't Work

Related topics