Cuda deviceQuery hangs

GPU: Quadro P4000
NVIDIA Driver: 390.87
CUDA driver version: 9.1
CUDA runtime version: 9.0

When I run /usr/local/cuda-9.0/extras/demo_suite/deviceQuery, it shows:

/usr/local/cuda-9.0/extras/demo_suite/deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Then it hangs forever.

But the weird thing is, if I run nvidia-smi exactly 4 times while running deviceQuery, it works and displays the device correctly.

What could possibly be the issue?