Buying Nvidia Products is a Serious Waste of Money: They Don't Work

I have a clean installation of CUDA-10.1 on an Ubuntu 18.04 LTS, headless server. This is the output of nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.95.01    Driver Version: 440.95.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40m          Off  | 00000000:82:00.0 Off |                    0 |
| N/A   28C    P0    61W / 235W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K40m          Off  | 00000000:C2:00.0 Off |                    0 |
| N/A   30C    P0    63W / 235W |      0MiB / 11441MiB |     41%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I compile the following simple program:

#include <cuda_runtime.h>

int main(void)
{
    cudaError_t err = cudaSuccess;

    float *buf = NULL;
    err = cudaMalloc((void **)&buf, 1000);

    if (err != cudaSuccess) {
        fprintf(stderr, "Failed to allocate device memory (error code %s)!\n", cudaGetErrorString(err));

        return EXIT_FAILURE;
    }

    err = cudaFree(buf);

    if (err != cudaSuccess) {
        fprintf(stderr, "Failed to free device memory (error code %s)!\n", cudaGetErrorString(err));

        return EXIT_FAILURE;
    }

    return EXIT_FAILURE;
}

Then I run it manually several times and it fails intermittently:

***@***:~/cuda$ ./test_cuda_malloc
***@***:~/cuda$ ./test_cuda_malloc Failed to allocate device memory (error code all CUDA-capable devices are busy or unavailable)!
***@***:~/cuda$ ./test_cuda_malloc Failed to allocate device memory (error code all CUDA-capable devices are busy or unavailable)!
***@***:~/cuda$ ./test_cuda_malloc Failed to allocate device memory (error code all CUDA-capable devices are busy or unavailable)!
***@***:~/cuda$ ./test_cuda_malloc
***@***:~/cuda$ ./test_cuda_malloc Failed to allocate device memory (error code all CUDA-capable devices are busy or unavailable)!
***@***:~/cuda$

I checked a lot of output of nvidia-smi , /var/log/kern , and /var/log/syslog and there is nothing to help me track down the problem. Disabling one of the cards didn’t help. The host is not used during these experiments and there is nothing that uses the NVidias.

What are steps to troubleshoot this intermittent failure?