Why NVML API can get the GPU but the CUDA API can't

i have the nvidia1 device in my docker container, as

work@benchmark:~$ ls -al /dev | grep nvi
crw-rw-rw-  1 root root 195,   1 May 15 00:23 nvidia1
crw-rw-rw-  1 root root 195, 255 May 15 00:23 nvidiactl
crw-rw-rw-  1 root root 234,   0 May 15 00:23 nvidia-uvm
crw-rw-rw-  1 root root 234,   1 May 15 00:23 nvidia-uvm-tools

and i build the nvml example in my container, the example can print the info of gpu

work@benchmark:/usr/local/cuda/nvml/example$ ./example
Found 1 device

Listing devices:
0. Tesla V100-SXM2-32GB
         Changing device's compute mode from 'Default' to 'Prohibited'
                 Need root privileges to do that: Insufficient Permissions
All done.
Press ENTER to continue...

finally, i use the cuda cudaGetDeviceCount api to find the gpu, but it turns out there is any gpus (or no-CUDA-capable device is detected) …

work@benchmark:~$ cat hi.cu
// CUDA Device Query

#include <stdio.h>

int main()
    // Number of CUDA devices
    int devCount = 0;

    printf("CUDA Device Query...\n");
    cudaError_t err = cudaGetDeviceCount(&devCount);
    if (err != cudaSuccess) printf("%s\n", cudaGetErrorString(err));

    printf("There are %d CUDA devices.\n", devCount);

    return 0;

work@benchmark:~$ ./a.out
CUDA Device Query...
no CUDA-capable device is detected
There are 0 CUDA devices.

does any one has ideas to fix my case ?

OS: Ubuntu16.04.3
CUDA: 10.0
Driver Version: 410.104
Nvidia GPU: Tesla V100-SXM2-32GB

and i also try set the CUDA_VISIBLE_DEVICES=1, and the cuda still also not work

never mind, i find the root case by myself …it’s a very naive problem, i start the nvidia-mps by

CUDA_VISIBLE_DEVICES=0 nvidia-cuda-mps-control -d

the problem is dismiss when i set the correct value of CUDA_VISIBLE_DEVICES, as the CUDA_VISIBLE_DEVICES=1, hope to help other guys who still struggle at my case

1 Like