Hi all,
I’m facing a failure of “deviceQuery” command on Microsoft Azure Linux VM with K80.
When this failure is occured, “nvidia-smi” works normally without any error.
This failure is occurred sometimes, not always.
After rebooting the system, this failure isn’t occurred.
My driver version is 390.30 and CUDA is 8.0.61.
I paste the output of deviceQuery and nvidia-smi below.
$ ./deviceQuery
./deviceQuery Starting…
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 30
→ unknown error
Result = FAIL
$ nvidia-smi
Thu May 10 04:54:48 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00009BE4:00:00.0 Off | 0 |
| N/A 29C P8 32W / 149W | 0MiB / 11441MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K80 On | 0000B457:00:00.0 Off | 1 |
| N/A 35C P8 27W / 149W | 0MiB / 11441MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla K80 On | 0000CBB6:00:00.0 Off | 0 |
| N/A 34C P8 27W / 149W | 0MiB / 11441MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla K80 On | 0000E872:00:00.0 Off | 0 |
| N/A 29C P8 32W / 149W | 0MiB / 11441MiB | 0% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
Does anyone have any solution for this failure?
Thanks,