deviceQuery Failed

Hi all,

I’m facing a failure of “deviceQuery” command on Microsoft Azure Linux VM with K80.
When this failure is occured, “nvidia-smi” works normally without any error.
This failure is occurred sometimes, not always.
After rebooting the system, this failure isn’t occurred.

My driver version is 390.30 and CUDA is 8.0.61.
I paste the output of deviceQuery and nvidia-smi below.

$ ./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
→ unknown error
Result = FAIL
$ nvidia-smi
Thu May 10 04:54:48 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00009BE4:00:00.0 Off | 0 |
| N/A 29C P8 32W / 149W | 0MiB / 11441MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K80 On | 0000B457:00:00.0 Off | 1 |
| N/A 35C P8 27W / 149W | 0MiB / 11441MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla K80 On | 0000CBB6:00:00.0 Off | 0 |
| N/A 34C P8 27W / 149W | 0MiB / 11441MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla K80 On | 0000E872:00:00.0 Off | 0 |
| N/A 29C P8 32W / 149W | 0MiB / 11441MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Does anyone have any solution for this failure?
Thanks,